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Information  sharing  among  heterogeneous  information  systems  has  been  a  ma- 
jor challenge  to  the  information  processing  community.  Two  major  obstacles  imped- 
ing its  realization  are  data  heterogeneity  and  system  heterogeneity.  The  traditional 
schema  integration  approach  cannot  effectively  resolve  data  heterogeneity  problems 
because  it  forces  all  users  to  view  their  data  in  the  same  way,  as  defined  by  an  inte- 
grated view.  Furthermore,  this  approach  is  not  scalable  nor  extensible  when  applied 
in  a  large-scale  information  system  environment. 

A  promising  approach  to  data  heterogeneity  problems  is  to  perform  run-time 
mediation  to  convert  one  data  representation  to  another  to  suit  each  user's  view  of 

ix 


data.  Despite  some  recent  research  efforts  on  information  mediation,  an  effective 
and  efficient  mediation  technique  is  still  needed.  Such  a  mediation  technique  must 
be  scalable  and  extensible.  It  must  also  be  compatible  with  the  technique  used  to 
resolve  system  heterogeneous  problems. 

In  this  research,  we  focus  on  information  mediation  in  the  context  of  global 
query  processing.  We  introduce  an  extensible  common  modeling  language  for  im- 
porting heterogeneous  data  representations  into  a  common  object-oriented  modeling 
environment.  We  also  introduce  a  mediation  specification  language  for  specifying 
the  structural  and  semantic  differences  between  the  data  representations  of  each 
pair  of  heterogeneous  systems  and  their  resolution  methods.  The  reusable  mediation 
specifications  in  a  multilevel  mediation  hierarchy  allow  a  mediated  heterogeneous 
information  system  to  achieve  extensibility  and  scalability.  In  an  implemented  proto- 
type system,  mediation  specifications  are  used  to  automatically  generate  distributed 
mediators  which,  in  conjunction  with  subquery  processing  components,  perform  dis- 
tributed query  processing  and  information  mediation  among  a  number  of  component 
systems  over  the  CORBA  communication  infrastructure.  In  this  system,  efficiency  is 
gained  by  build-time  compilation  of  mediation  specification  into  distributed  media- 
tion code  for  execution  at  run-time. 


CHAPTER  1 
INTRODUCTION 


The  rapid  advancement  of  the  communication  network  technology  in  recent 
years  has  made  it  possible  to  browse  and  access  data  stored  in  different  computers 
via  international  networks  such  as  the  Internet.  However,  the  access  of  very  specific 
and  useful  information  stored  in  disparate  information  systems,  such  as  DBMSs  and 
file  systems,  is  still  not  well-supported.  Consider  the  case  of  sharing  global  medical 
information  on  organ  donations.  It  is  important  and  urgent  that  any  hospital  in  the 
world  should  have  access  to  the  global  information  on  available  organs  and  donors  in 
order  to  arrange  timely  organ  transplant  operations  for  patients  who  need  them.  This 
requires  not  only  the  interconnection  of  multiple,  distributed,  and  heterogeneous  in- 
formation systems  but  also  a  distributed  query  processing  facility  and  an  information 
mediation  mechanism  to  access  and  resolve  the  discrepancies  of  data  representations 
in  dissimilar  systems. 

In  an  integrated  heterogeneous  information  system,  system  and  data  hetero- 
geneities among  component  systems  are  the  major  obstacles  toward  information 
sharing.  System  heterogeneity  problems  result  from  different  hardware  and  soft- 
ware platforms  of  component  systems  and  the  use  of  different  information  or  data 
models  and  information  systems.  The  differences  in  hardware/software  platforms 
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are  usually  resolved  by  using  a  common  communication  infrastructure  (e.g.,  TCP/IP 
communication  Socket,  RPC,  CORBA's  ORB,  etc.).  However,  the  differences  in  in- 
formation and  data  models  are  more  difficult  to  deal  with.  It  is  not  possible  to  force 
all  users  to  use  the  same  data  or  information  model  for  different  applications  since 
one  data  model  which  is  suitable  for  one  application  domain  may  not  be  suitable  for 
another.  Besides,  people  working  in  different  application  areas  may  have  their  own 
preferences  on  which  models  to  use.  The  solution  to  this  problem  is  either  to  do 
pair-wise  translations  of  modeling  constructs  or  to  use  a  common  or  neutral  model, 
to  and  from  which  all  data  models  used  by  the  users  are  translated.  The  data  model 
heterogeneity  problem  needs  to  be  resolved  before  any  successful  system  integration 
can  be  achieved. 

Data  heterogeneity,  on  the  other  hand,  is  due  to  different  ways  of  modeling 
data  and  different  semantics  and  structural  representations  associated  with  data  val- 
ues. Problems  in  this  category  have  been  thoroughly  investigated  and  can  be  gen- 
erally categorized  in  two  types:  schematic  heterogeneity  and  semantic  heterogeneity. 
Schematic  heterogeneity  is  due  to  different  ways  of  naming  and  structuring  data; 
whereas,  semantic  heterogeneity  is  due  to  different  representations  of  data  values. 
Because  of  these  problems,  queries  issued  against  multiple  information  sources  can 
not  be  processed  directly  and  data  returned  from  them  can  not  be  readily  used. 
Query  and  data  conversions  are  needed  to  bridge  the  naming,  structural  and  seman- 
tic gaps.  Two  basic  approaches  have  been  taken  to  deal  with  the  problems  of  data 
heterogeneity. 
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The  traditional  approach  is  to  establish  a  global  schema  which  reconciles  the 
naming  and  structural  conflicts  and  unifies  the  semantic  representations  of  hetero- 
geneous component  systems  [CHE88].  This  integrated  global  schema  is  then  shared 
and  referenced  by  the  users  to  pose  queries.  Thus,  all  conflicts  and  differences  are 
resolved  at  the  time  of  schema  design  and  integration.  The  major  drawback  of  this 
approach  is  that  the  shared  integrated  schema  forces  the  users  to  view  data  in  the 
same  way  instead  of  the  ways  that  are  familiar  to  them. 

In  contrast  to  the  traditional  schema  integration  approach,  more  recent  thinking 
is  to  "mediate"  dissimilar  data  representations  instead  of  "integrating"  them.  This 
mediation  approach  is  typically  done  by  using  some  mediation  rules  or  specifications 
which  are  used  to  resolve  various  kinds  of  conflicts  among  component  systems  at  run- 
time. A  mediated  information  system  allows  the  users  to  see  data  in  their  own  views. 
They  can  issue  queries  based  on  their  own  views  and  receive  data  in  representations 
that  are  familiar  to  them.  Furthermore,  the  mediation  approach  provides  better 
support  for  system  extensibility  and  scalability  because,  unlike  the  schema  integration 
approach,  adding  new  component  systems  to  the  heterogeneous  system  can  be  done 
by  changing  or  adding  mediation  rules  or  specifications  instead  of  redesigning  or 
modifying  the  integrated  schema.  Due  to  the  above  advantages,  we  have  adopted  the 
mediation  approach  in  this  research  work. 

Even  though  several  recent  mediation  research  efforts  have  reported  some  inter- 
esting results  which  shall  be  surveyed  in  the  next  chapter,  two  important  problems 
still  need  to  be  explored.  First,  how  do  we  solve  both  system  and  data  heterogeneity 
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problems  in  a  unified  framework?  Second,  how  do  we  implement  an  information  mod- 
eling and  mediation  system  that  is  efficient  and  scalable?  Our  mediation  research 
focuses  on  developing  effective  and  efficient  solutions  to  these  problems. 

Our  mediation  strategy  is  outlined  as  follows.  First,  to  resolve  system  and 
data  heterogeneities,  the  specifications  of  component  systems  and  data  are  uniformly 
modeled  in  an  object-oriented,  semantically-rich  common  modeling  language,  called 
NCL.  Second,  to  resolve  data  heterogeneities,  we  have  designed  a  high-level  mediation 
specification  language.  It  is  used  to  capture  the  interrelationships  among  the  data 
specifications  of  component  systems  and  the  information  for  resolving  both  schematic 
and  semantic  heterogeneities.  This  language  can  be  applied  in  a  multi-level  media- 
tion hierarchy  in  which  the  upper-level  mediation  specification  can  be  incrementally 
generated  by  reusing  its  lower-level  mediation  specifications.  Third,  the  mediation 
specification  is  compiled  to  generate  a  set  of  object  classes  and  some  implementations 
defined  in  NCL.  These  generated  classes  which  we  shall  call  "mediation  elements" 
model  the  distributed  query  processor,  mediators,  and  subquery  processors.  They 
contain  active  mediation  rules  which  are  triggered  to  perform  mediation  operations 
at  run-time.  Finally,  the  NCL  specifications  of  the  component  systems  and  the  medi- 
ation elements  are  combined  to  form  a  mediated  global  schema  which  is  then  compiled 
by  the  NCL  compiler  to  generate  executable  program  code  for  mediated  distributed 
query  processing. 
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The  term  "mediation",  first  proposed  by  Wiederhold  [WIE92],  was  defined 
very  broadly  [WIE94,  WIE95].  The  suggested  services  to  be  provided  by  the  me- 
diation system  may  include  heterogeneity  reconciliations,  query  optimization,  secu- 
rity rule  enforcement,  negotiation,  etc.  However,  as  pointed  out  in  the  literature 
[BRE90,  CHAL94,  CHAT91,  GOH94,  HAM93,  KIM91,  SU93,  VEN91],  the  most 
critical  problem  in  a  heterogeneous  information  system  is  the  problem  of  resolving 
various  kinds  of  data  heterogeneities.  This  problem  has  not  been  effectively  solved, 
especially  in  a  large-scale  system  environment.  Therefore,  our  mediation  research 
focuses  on  "information  mediation"  for  resolving  the  data  heterogeneity  problem, 
within  the  context  of  global  query  processing. 

The  remainder  of  this  dissertation  is  organized  as  follows.  Chapter  2  contains 
a  survey  of  related  research  works  pertinent  to  information  modeling  and  mediation 
system  design.  Chapter  3  describes  the  general  framework  of  this  mediation  system, 
explaining  the  build-time  system  architecture,  user's  views  in  issuing  queries  and  the 
run-time  communication  architecture.  Chapters  4  and  5  detail  the  common  mod- 
eling language,  NCL,  and  the  mediation  specification  language,  MSL,  respectively. 
Chapter  6  explains  the  approach  taken  to  implement  the  mediation  system.  It  covers 
the  translator/compiler  developments  of  the  two  languages.  Chapter  7  describes  the 
run-time  mediation  rule  execution  and  distributed  query  processing,  and  presents 
some  experimentation  results  using  a  query  example.  Chapter  8  summarizes  the 
main  contributions  of  this  research  and  discusses  possible  directions  for  future  work. 


CHAPTER  2 
SURVEY  OF  RELATED  WORK 


This  chapter  presents  a  survey  of  related  work  pertinent  to  this  research.  Since 
the  focus  of  this  research  is  on  information  modeling  and  mediation  techniques,  we 
shall  survey  the  related  works  in  these  two  areas  below. 

2.1    Information  Modeling 

Two  independent  efforts  on  standards  are  particularly  relevant  to  the  develop- 
ment of  information  modeling  facility;  namely,  the  efforts  of  the  Object  Management 
Group  (OMG)  and  the  International  Standard  Organization's  Committee  on  the 
STandard  for  the  Exchange  of  Product  model  data  (ISO/STEP). 

OMG  is  formed  by  a  consortium  of  over  five  hundred  industrial  companies  which 
aims  to  define  and  develop  object-oriented  technologies  for  achieving  interoperability 
among  dissimilar  computing  platforms.  It  developed  a  Common  Object  Request 
Broker  Architecture  (CORBA)  [OMG91]  for  achieving  object  interoperability.  The 
interfaces  of  all  objects  of  interest  are  specified  in  a  common  Interface  Definition 
Language  (IDL)  [SOM93,  OMG91].  An  interface  specification  is  compiled  to  generate 
program  skeletons  and  stubs  for  inclusion  into  the  server  programs  and  the  client 
programs,  respectively.  At  run-time,  a  software  or  human  client  can  make  requests 
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for  object  services,  which  are  dispatched  to  the  proper  servers  in  a  heterogeneous 
network,  thus  achieving  client/server  interoperability. 

The  ISO/STEP  community,  on  the  other  hand,  emphasizes  the  development 
of  standards  for  product  modeling  and  product  data  exchange.  One  of  its  major 
efforts  is  the  development  of  an  information  modeling  language  named  EXPRESS 
[IS092].  EXPRESS  provides  a  rich  set  of  constraint  specifications  by  using  keywords, 
functions,  procedures  and  constraint  rules.  It  is  a  powerful  information  modeling 
language.  The  language  has  been  widely  accepted  and  used  by  a  number  of  product 
design  and  manufacturing  communities. 

In  spite  of  the  individual  success  and  acceptance  of  these  two  standards  efforts, 
the  result  produced  by  each  does  not  adequately  solve  the  data  sharing  and  program 
interoperability  problems  found  in  a  heterogeneous  environment.  For  example,  the 
underlying  object  model  of  OMG's  IDL  is  that  of  C++.  While  it  may  be  adequate  for 
achieving  program  interoperability  (since  the  underlying  data  models  of  most  of  the 
existing  object-oriented  programming  languages  have  the  similar  modeling  power  as 
IDL),  the  semantics  captured  by  IDL  is  not  rich  enough  for  modeling  complex  objects 
processed  by  many  existing  application  systems.  When  IDL  is  used  as  the  common 
modeling  language  to  model  and  encapsulate  the  objects  and  object  services  of  an 
existing  application  system  (e.g.,  a  relational  database  application  or  a  CAD  appli- 
cation), much  of  the  semantics  of  the  objects  cannot  be  captured  explicitly  because 
IDL  does  not  have  the  necessary  modeling  constructs  for  capturing  constraints  by 
keywords  and/or  by  integrity  rules.  Thus,  much  of  the  needed  semantics  have  to  be 
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embedded  in  the  application  code.  Recognizing  the  limitations  of  the  object  model 
underlying  most  of  the  existing  object-oriented  DBMSs,  the  Object  Data  Manage- 
ment Group  (ODMG)  [ODM93]  and  others  (e.g.,  [FL095])  have  made  some  effort  to 
extend  the  object  model's  capabilities  to  capture  some  semantic  constraints  of  data 
such  as  "key"  and  "inverse  attribute"  constraints.  However,  these  extensions  are 
still  far  from  meeting  the  actual  needs  for  modeling  complex  objects  found  in  many 
application  domains. 

On  the  other  hand,  although  EXPRESS  is  semantically  much  richer  than  IDL 
and  has  an  object-oriented  flavor,  it  is  not  an  object-oriented  information  modeling 
language  because  it  does  not  support  the  encapsulation  of  behavioral  properties  of 
objects.  Unlike  methods  found  in  an  object-oriented  model,  functions  and  procedures 
defined  in  EXPRESS  are  global  properties  in  a  schema  and  are  used  in  rules  for 
constraint  specifications. 

Our  information  modeling  language  is  designed  to  combine  the  behavioral  spec- 
ification of  IDL  and  the  information  modeling  power  of  EXPRESS  into  a  single 
well-integrated  object  model  and  modeling  language.  The  resulting  object  model 
and  language  can  be  ideal  for  modeling  objects  and  object  services  in  a  network  of 
heterogeneous  computing  systems.  The  language  can  be  designed  to  conform  to  the 
two  standard  languages  (IDL  and  EXPRESS)  semantically  and,  as  much  as  possi- 
ble, syntactically.  In  addition,  our  modeling  language  has  additional  new  features 
including  more  semantic  associations  and  constraints,  and  behavioral  modeling  using 
Event-Condition- Action-AlternativeAction  rules. 
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2.2    Mediation  System 

Several  research  efforts  on  information  mediation  have  been  reported  in  recent 
years  [AMB94,  CHAW94,  GOH94,  LU95,  FL096,  SAU96].  They  can  be  categorized 
into  two  types.  One  type  aims  to  mediate  the  differences  between  the  integrated 
global  view  and  the  views  of  component  systems.  In  the  systems  reported  in  [AMB94, 
CHAW94,  GOH94,  LU95,  FL096],  a  global  integrated  schema  is  used  to  model  the 
global  data  resources  and  the  mediation  is  between  the  component  systems'  data 
representations  and  the  global  data  representation.  The  approach  requires  only  2N 
mediation  mappings  where  N  is  the  number  of  component  systems.  This  approach 
is  very  similar  to  the  traditional  integrated  approach  except  some  kind  of  mediation 
specification  is  used  to  facilitate  data  translations.  The  other  type  of  mediation  work 
[QIA93,  SAU96]  mediates  the  differences  between  the  views  of  each  pair  of  component 
systems.  It  emphasizes  the  support  to  users'  preferences  in  viewing  data  in  their  own 
ways.  It  has  the  advantages  of  extensibility  and  scalability  addressed  previously.  Our 
mediation  work  belongs  to  the  second  type. 

In  terms  of  mediation  knowledge  representation,  pattern-based  logic  rules  [AMB94, 
CHAW94,  GOH94,  LU95,  FL096]  have  been  widely  used  by  different  research  groups 
in  the  A.I.  community.  However,  these  logic  rules  are  low-level  in  their  representations 
and  are  difficult  for  the  user  to  understand,  modify  and  maintain.  Their  executions 
are  based  on  a  repetitive  inferencing  process  which,  generally  speaking,  is  not  very 
efficient.  Their  underlying  inference  engines  are  based  on  a  run-time  interpretation 
of  rules  which  makes  it  difficult  to  achieve  high  performance.  In  our  work,  we  use  a 
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high-level  mediation  specification  language  for  better  readability  and  maintainability. 
Its  implementation  is  based  on  a  compilation  approach  to  generate  event-triggered 
mediation  rules.  The  mediation  rules  are,  in  turn,  compiled  to  generate  execution 
code  to  carry  out  the  mediation  process  at  run-time.  Compared  to  the  interpretation 
approach,  this  compilation  approach  can  be  much  more  efficient. 

Almost  all  the  above  mediation  works  store  and  manage  their  mediation  rules 
or  specifications  in  a  centralized  system.  This  strategy  is  not  suitable  for  managing 
a  large  number  of  rules.  When  the  number  of  component  systems  becomes  large, 
adding  or  modifying  rules  is  an  expensive  process.  A  better  strategy  [CHAW94]  is  to 
distributedly  maintain  the  mediation  rules  in  a  hierarchical  structure.  Each  media- 
tor in  the  structure  integrates  its  next-level  systems  and  exports  an  integrated  view 
to  its  upper-level  mediators.  That  is,  each  mediator  only  performs  the  mediation 
between  its  integrated  view  and  the  views  of  its  next-level  systems.  The  mediation 
rules  are  thus  distributed  in  different  mediators  and  the  complexity  of  rule  man- 
agement is  reduced.  However,  there  are  two  major  shortcomings  in  this  approach. 
First,  the  problems  of  the  integrated  view  are  not  resolved.  Second,  due  to  the 
point-to-point  communication  between  each  mediator  and  its  children,  the  efficiency 
of  query  processing  is  affected  since  queries  issued  against  the  upper-level  mediators 
may  need  to  be  passed  via  many  intermediate  mediators  to  reach  the  information 
sources.  Our  work  remedies  the  two  shortcomings  by  using  a  different  hierarchical 
mediation  strategy.  Instead  of  creating  multiple  integrated  views,  we  build  up  a  medi- 
ated global  schema  based  on  an  object-oriented  hierarchy.  Different  from  [CHAW94], 
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each  mediation  component  in  this  hierarchy  contains  mediation  classes  to  connect 
and  mediate  related  entities  of  its  underlying  component  systems,  without  exporting 
an  integrated  view.  Since  there  is  no  view  integration  performed,  user's  preferences 
on  viewing  data  are  not  affected.  In  addition,  to  improve  the  query  processing  effi- 
ciency, the  translation  of  the  mediation  specification  is  done  by  generating  executable 
code  which  directly  dispatches  queries  to  the  information  sources.  Furthermore,  to 
achieve  better  performance  in  distributed  query  processing,  the  mediation  method 
code  generated  by  the  compiler  are  distributed  in  multiple  mediators,  each  of  which 
is  linked  with  the  code  of  an  information  source.  Unlike  most  of  the  mediation  works 
which  overload  a  centralized  mediator,  the  mediation  tasks  are  distributed  among 
and  processed  by  component  systems  that  contain  the  relevant  data. 


CHAPTER  3 

A  GENERAL  FRAMEWORK  OF  THE  MEDIATION  SYSTEM 


This  chapter  presents  a  general  framework  of  the  mediation  system.  Section 
3.1  describes  the  system  architecture  from  the  build-time  perspective.  Then,  based 
on  the  architecture,  Section  3.2  explains  the  users'  views  of  issuing  global  queries. 
Section  3.3  describes  the  CORBA-based  run-time  query  processing  environment. 

3.1    System  Architecture 

Figure  3.1  depicts  the  general  build-time  architecture  of  our  information  medi- 
ation system.  As  shown  at  the  lower  part  of  the  figure,  component  systems  could  be 
heterogeneous  information  sources,  such  as  databases  modeled  using  different  data 
models  (e.g.,  relational,  hierarchical,  network,  object-oriented,  etc.)  or  file  systems. 
To  resolve  the  problem  of  dissimilar  data  models,  an  0-0  common  modeling  lan- 
guage is  used  to  uniformly  model  the  data  and  software  resources  of  these  compo- 
nent systems  to  form  a  set  of  component  schemas.  A  wrapper  is  used  to  do  the 
mapping  between  the  object-oriented  representation  and  the  native  representation  of 
each  component  system. 

A  high-level  information  mediation  language  is  then  used  to  specify  the  differ- 
ences and  similarities  of  each  pair  of  component  schemas  and  the  ways  to  mediate 
their  differences.  In  the  mediation  specifications,  mediation  classes  are  defined  over 
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the  related  entity  classes  of  different  component  schemas,  and  mediation  clauses  are 
used  to  specify  their  interrelationships  and  the  conversion  methods  needed  to  mediate 
their  differences. 
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Figure  3.1.  System  Architecture  of  Schema  Mediation 


The  component  schemas  and  the  additional  mediation  classes  defined  in  the 
mediation  specifications  form  a  mediated  global  schema  (as  shown  in  the  dotted 
rectangle) .  This  mediated  global  schema  allows  users  to  have  their  own  views  of  data 
because  each  component  schema  is  a  part  of  the  mediated  global  schema.  The  users 
of  different  component  systems  can  issue  global  queries  based  on  the  views  that  are 
familiar  to  them. 
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3.2    Users'  Views  for  Issuing  Queries 

In  the  mediated  global  schema,  there  are  two  ways  of  issuing  global  queries  to 
retrieve  data  from  multiple  information  sources.  One  way  is  based  on  the  component 
system  schemas;  the  other  way  is  based  on  the  mediation  classes  defined  in  the 
mediation  specifications. 
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Figure  3.2.  Example  of  Creating  a  Mediation  Class  on  Top  of  Two  Related  Classes 


To  illustrate  these  two  views  for  issuing  global  queries,  we  use  the  example 
schema  given  in  Figure  3.2  which  contains  a  mediation  class  SP  defined  on  top  of  two 
related  classes,  A  and  B,  in  component  schemas  DB_1  and  DB_2,  respectively.  The  M 
association  from  the  mediation  class  SP  to  classes  A  and  B  is  conceptually  similar  to 
the  generalization  (i.e.  G  association).  The  only  difference  is  that  the  mediation  class 
SP  using  an  M  association  upward  inherits  all  attributes  of  its  constituent  classes  (i.e., 
A  and  B).  In  this  example,  three  attributes  (al,  a2  and  a3)  of  class  A  are  equivalent  to 
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three  corresponding  attributes  (bl,  b2  and  b3)  of  class  B,  even  though  their  attribute 
names  are  distinct.  Attribute  a4  of  class  A  and  attribute  b4  of  class  B  are  distinct 
properties  of  these  two  classes.  A  user  can  query  the  system  based  on  the  following 
two  views. 

Query  Based  on  Component  Schemas 

A  global  query  can  be  issued  based  on  the  view  of  either  DB_1  or  DB_2.  Below 
is  a  global  query  issued  based  on  the  DB_l's  view.  The  query  is  posed  in  an  extended 
version  of  the  object  query  language  (OQL)  reported  in  [ALA89]. 

CONTEXT  c:DB_l: :A 
WHERE  c.a3  >=  "100" 
RETRIEVE  c.al,  c.a3,  c.a4 

This  query  is  to  retrieve  data  relevant  to  the  attributes  al,  a3  and  a4  of  class  A 
with  the  constraint,  A.a3  >=  "100".  In  the  query,  c  is  a  range  variable  representing 
all  instances  which  are  accessible  from  all  the  component  systems  based  on  the  view 
of  class  A  in  DB_1.  Since  the  mediation  specification  specifies  that  objects  in  DB_1::A 
and  DB_2::B  are  related  objects  belonging  to  the  same  mediation  class  even  though 
they  have  different  attribute  names,  they  should  all  be  retrieved  based  on  the  view  of 
DB.l.  To  achieve  this,  a  subquery  is  generated  to  retrieve  data  of  class  B  in  DB_2. 
To  enable  the  subquery  processing  in  the  component  system  DB_2,  a  mediator  which 
is  generated  based  on  the  mediation  specification  is  linked  with  the  legacy  system 
that  manages  DB_2.  The  mediator  would  modify  the  subquery  to  meet  the  naming 
and  structural  requirements  of  DB_2  and  to  be  processed  by  the  legacy  system.  It 
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also  would  convert  the  data  retrieved  from  DB_2  into  the  data  specification  of  DB_1 
before  being  merged  with  the  data  retrieved  from  DELI.  The  merged  data  form  the 
final  query  result.  Similarly,  a  global  query  can  be  issued  based  on  the  DB_2's  view 
to  retrieve  data  that  are  stored  in  DBJL 

Query  Based  on  Mediation  Extension 

Based  on  the  preceding  view,  the  attribute  b4  of  DB_2  is  not  visible  since  it  is 
related  to  class  A.  To  retrieve  this  or  other  distinct  attributes  of  DB_2,  the  user's 
view  should  move  up  to  the  mediation  class  (i.e.,  SP)  which  upward  inherits  all  the 
attributes  of  its  related  constituent  classes.  The  following  OQL  query  issued  against 
the  mediation  class  SP  would  retrieve  A.al,  A.a3  and  values  of  distinct  attributes 
(i.e.,  DB_l::A.a4  and  DB_2::B.b4). 

CONTEXT  c : MED : : SP 

WHERE  c\DB_l::A.a3  >=  "100" 

RETRIEVE  c\DB_l: :A.al,  c\DB_l : :A.a3,  c\DB_l : : A.a4,  c\DB_2::B.b4 

The  backslash  symbol  used  in  the  query  is  a  group  identifier  used  to  indicate  from 
which  classes  the  attributes  are  to  be  retrieved.  It  is  to  avoid  the  ambiguity  resulting 
from  having  the  same  attribute  names  in  different  classes  with  different  meanings 
(i.e.,  homonyms).  Note  that  all  the  upward  inherited  attributes  of  the  mediation 
class  SP  are  visible  to  a  query  issued  against  SP.  Thus,  for  those  equivalent  attributes 
(e.g.,  al  and  bl)  upward  inherited  by  the  mediation  class,  the  user  can  choose  which 
attribute  representations  to  see  the  data  by  specifying  the  proper  attribute  names  in 
the  RETRIEVE  clause  of  the  query.  For  example,  in  the  RETRIEVE  clause  of  the 
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above  query,  c\DB_l::A.al  can  be  changed  to  c\DB_2::B.bl  to  retrieve  the  data  in 
bl's  representation. 

3.3    CORBA-based  Client/Server  Query  Processing 

In  a  heterogeneous  information  system,  component  systems  are  physically  dis- 
tributed and  interconnected  by  a  communication  network.  In  our  work,  the  informa- 
tion mediation  system  uses  CORBA's  Object  Request  Broker  (ORB)  as  the  commu- 
nication infrastructure. 
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Figure  3.3.  CORBA-based  Mediation  System  Architecture 


As  shown  in  Figure  3.3,  component  systems  communicate  with  each  other  via 
an  ORB.  In  this  communication  infrastructure,  the  user  or  application  program  can 
request  services  from  other  component  systems  by  message  passing  to  invoke  methods. 
The  method  being  invoked  can  exist  either  locally  or  remotely.  The  responsibility 
of  the  ORB  is  to  dispatch  the  messages  to  the  appropriate  servers  in  a  transparent 
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manner.  The  activated  method  may  again  invoke  other  local  or  remote  methods  to 
satisfy  the  service  requirements. 

Our  information  mediation  system,  as  shown  in  Figure  3.3,  consists  of  Dis- 
tributed Query  Processor  (DQP),  client,  KBMS  and  information  sources  (i.e.,  DELI, 
DB_2  and  DEL3)  whose  program  codes  are  linked  with  generated  Subquery  Proces- 
sors (SQPs)  and  Mediators.  At  build-time,  each  of  them  provides  the  interfaces  to 
its  services  which  are  defined  in  an  NCL  component  schema.  A  mediated  global 
schema  combining  these  NCL  component  schemas  is  developed  and  compiled  to  gen- 
erate program  bindings  which  include  implementation  program  skeletons  and  stub 
header  files.  The  program  skeletons  are  used  by  the  servers  into  which  the  program 
code  is  inserted;  whereas,  the  stub  header  files  are  included  in  the  client  programs 
which  request  these  services.  Then,  the  programs  of  both  server  and  client  sides  are 
compiled  and  dynamically  linked  together  in  the  ORB  network. 

At  run-time,  the  user  or  application  program  can  issue  a  global  query  as  a 
parameter  value  to  the  DQP  by  calling  its  key  method  (global_query_execution  in  our 
implementation).  Once  this  method  is  invoked,  the  processing  of  the  global  query 
is  achieved  by  the  collaboration  of  these  component  systems  (i.e.,  DQP,  KBMS,  and 
SQPs  and  Mediators),  each  of  which  provides  part  of  the  distributed  query  processing 
and  mediation  services. 


CHAPTER  4 
NCL:  THE  COMMON  MODELING  LANGUAGE 


This  chapter  presents  a  common  modeling  language,  NCL,  for  modeling  data  and 
software  modules  of  component  systems  in  a  heterogeneous  network.  The  language 
is  developed  under  a  program  project  funded  by  DARPA  named  National  Industrial 
Information  Infrastructure  Protocols  (NIIIP)  and  the  language  is  thus  called  NIIIP 
Common  Language  (NCL).  Section  4.1  first  explains  the  need  of  a  common  modeling 
language.  Then,  the  description  of  the  common  language,  including  its  features  and 
syntax,  is  covered  in  Section  4.2.  Section  4.3  gives  an  example  of  using  this  language 
to  model  component  systems. 

4.1    Need  of  a  Common  Modeling  Language 

In  order  to  resolve  the  data/information  model  heterogeneity  problems,  a  widely 
used  approach  employs  wrappers.  A  wrapper  is  a  program  which  translates  the  model- 
ing constructs  of  one  data/information  model  to  those  of  another.  If  a  heterogeneous 
information  system  has  N  component  systems  which  use  N  different  data/information 
models,  one  approach  is  to  write  N*(N-1)  wrappers  to  do  pair- wise  translations.  How- 
ever, this  is  not  effective.  A  better  and  commonly  used  approach  is  to  introduce  a 
common  or  neutral  model,  to  and  from  which  N  models  are  translated.  This  approach 
requires  only  2N  wrappers.  It  is  adopted  in  our  work. 
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Figure  4.1.  Using  NCL  to  Model  Heterogeneous  Information  Resources 


To  model  the  information  resources  of  different  component  systems,  the  common 
model  needs  to  be  semantically  richer  than  the  data/information  models  used  by  the 
component  systems  to  avoid  semantic  losses  in  translations.  Its  modeling  language 
should  conform  as  much  as  possible  to  the  languages  introduced  by  the  standard  com- 
munity. In  our  work  on  mediation,  we  have  designed  a  common  modeling  language, 
named  NCL  [SU96],  to  model  the  information  resources  of  component  systems,  as 
illustrated  in  Figure  4.1. 

4.2    Description  of  NCL 

NCL  is  an  integration  of  the  language  features  of  CORBA's  IDL  [SOM93], 
ISO's  EXPRESS  [IS092]  and  K.3  [SHY96,  ARR97].  K.3  is  the  third  version  of 
an  implemented  knowledge  base  programming  language  developed  at  the  University 
of  Florida.  NCL's  underlying  object  model  is  the  extensible  object  model  of  K.3, 
which  is  founded  on  the  concept  of  objects  and  object  associations  introduced  in  the 
Object-oriented  Semantic  Association  Model  (OS AM*  [SU89])  and  its  algebra  and 
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calculus  [SU93,  KAM94].  NCL  uses  1)  the  method  specification  facility  of  IDL,  2) 
the  type  and  entity  specifications  and  the  keyword  constraints  of  EXPRESS,  and  3) 
the  knowledge  rule  specification,  language  extensibility  features,  and  association  type 
specifications  of  K.3.  The  overall  structure  of  NCL  is  shown  below  and  its  complete 
BNF  syntax  rules  can  be  found  in  Appendix  A. 


(*  SCHEMA  declaration.  *) 

(*  The  SCHEMA  class  has  an  inclusion  relationship  with  its  component  class 
types  *) 

DEFINE  SCHEMA  schema.id; 

END_DEFINE; 

(*  TYPE  declaration  *) 

DEFINE  TYPE  type_id  =  underlying_type  IN  schema_id; 
WHERE    (*  domain  rule  in  TYPE  *) 
rule_label_l :  expression_l ; 

METHODS : 

EXCEPTION  exception_id  (var_l :type_l ; . . ) ; 

METHOD  [ONEWAY]  method_id 

( [IN | OUT | INOUT]  para_id : para_type ;  .  .  . )  :  return.type 
[RAISES  (exception.id,  ...)]; 

END.DEFINE; 

(*  ENTITY  declaration  *) 

DEFINE  ENTITY  entity_id  IN  schema_id; 

SUPERTYPE  OF  (supertype_expression)    (*  supertype  declaration  *) 
SUBTYPE  OF  (subtype  list)  (*  subtype  declaration  *) 

attr.id:   [OPTIONAL]  base.type  [WHERE  ([TOTAL] 

[CARDINALITY ( [LI :U1] : [L2:U2])])  ] ; 

DERIVE 
INVERSE 


UNIQUE 


[WHERE    (*  domain  rule  in  ENTITY  *) 
rule_label_l :  rule_expression_l ; 

ASSOCIATIONS:      (*  Other  association  types  *) 
INTERACTION  OF  (attr_link_l :Entity_l ;attr_link_2 :Entity_2; . . . ) 
CARDINALITY 

(attr_link_l:attr_link_2=  [L1:U1]:  [L2:U2] ; . . . ) ; 


METHODS:  (*  method  declaration  *) 

EXCEPTION  exception_id  (var_l :type_l ; . . ) ; 

METHOD  [ONEWAY]  method_id 

( [IN | OUT | INOUT]  para_id : para_type ;  .  .  . )  :  return_type 
[RAISES  (exception_id,  ...)]; 


(*  Local  rule  declaration  *) 
RULES : 

RULE  rule_id; 

[TRIGGERED  triggered_time  trigger_operation,  triggered_time 

trigger_operation,   ] 

[CONDITION  condition_clause] 
[ACTION 

statement_list] 
[OTHERWISE 

statement_list] 
END_RULE ; 

END.DEFINE; 

(*  Global  RULE  declaration  *) 
RULE  rule.id; 

[TRIGGERED  triggered.time  trigger_operation,  triggered_time 

trigger_operation,  ] 

[CONDITION  condition.clause] 
[ACTION 

statement _list] 
[OTHERWISE 

statement_list] 
END_RULE; 

(*  Method  implementation  *) 
METHOD  [class_id: :]method_id; 
[LOCAL 
local_var_declaration; 
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END_LOCAL;] 
[statement_list] 
END_METHOD; 

The  above  structure  gives  the  skeletal  specifications  of  schema,  type,  entity,  global 
rule,  and  method  implementation.  The  symbols  and  clauses  enclosed  in  a  pair  of 
brackets  [  ]  can  be  optional  when  other  mandatory  or  optional  symbols  and  clauses 
are  present.  The  syntax  of  NCL  resembles  that  of  EXPRESS;  however,  minor  changes 
have  been  made  to  EXPRESS  in  order  to  introduce  the  language  extensibility  feature 
and  other  additional  features  of  NCL. 

NCL  provides  the  constructs  for  defining  schemas,  data  types  and  entity  types. 
Their  definitions  are  enclosed  by  a  pair  of  keywords,  DEFINE  and  ENDJ3EFINE.  In 
NCL,  a  schema  is  treated  as  a  first  class  object.  The  information  resources  of  different 
component  systems  are  defined  by  separate  schemas,  and  their  interrelationships  are 
specified  by  associations  among  these  schema  objects,  much  the  same  way  as  the 
associations  among  data  objects.  In  NCL,  ENTITY,  TYPE,  SCHEMA,  and  any 
other  new  class  types  are  treated  as  identifiers  (i.e.,  a  parameter  of  the  keyword 
DEFINE)  instead  of  keywords.  The  keyword  DEFINE  signals  the  compiler  that  the 
identifier  following  it  should  match  with  a  class  type  defined  in  the  meta-model.  Other 
class  types  can  be  added  to  the  meta-model  by  the  knowledge  base  customizer  who 
customizes  the  extensible  object  model  to  meet  the  modeling  needs  of  an  application 
domain.  Any  identifier  can  be  used  as  the  name  of  a  class  type  as  long  as  the  name 
used  in  NCL  is  consistent  with  the  name  used  in  the  meta-model  in  which  the  class 
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type  is  defined.  Thus,  by  adding  new  class  types  into  the  meta-model,  we  can  extend 
the  modeling  capability  of  NCL,  thus  achieving  NCL's  class-type  extensibility. 

The  definitions  of  TYPE  and  ENTITY  are  similar  to  those  of  EXPRESS  except 
that  their  semantic  contents  have  been  enriched.  In  the  TYPE  specification,  the 
behavioral  properties  of  a  data  type  are  specified  in  terms  of  methods  which  have  the 
same  semantic  contents  as  IDL's  interface  specifications. 

In  the  definition  of  an  ENTITY,  the  SUPERTYPE/SUBTYPE  specification  is 
the  same  as  EXPRESS.  The  attribute  specification  and  frequently-used  constraints 
defined  by  keywords  (e.g.,  UNIQUE,  OPTIONAL,  DERIVE,  and  INVERSE,  which 
we  shall  call  "keyword  constraints")  are  the  same  as  EXPRESS.  Additional  con- 
straints, such  as  1)  TOTAL  (i.e.,  the  total  participation  constraint  which  specifies 
that  all  the  objects  in  the  class  from  which  the  attribute  draws  its  values  have  to 
associate  with  some  objects  defined  by  the  attribute),  2)  CARDINALITY  (i.e.,  car- 
dinality mappings  between  the  class  in  which  the  attribute  is  defined  and  the  class 
from  which  the  attribute  draws  its  values),  and  3)  other  user-defined  constraints 
associated  with  the  attribute,  are  specified  as  identifiers  which  follow  the  keyword 
WHERE.  Constraints  which  are  applicable  to  the  entire  entity  class  are  also  defined 
by  identifiers  following  the  keyword  WHERE,  as  shown  in  the  line  with  the  comment 
(*  domain  rule  in  ENTITY  *).  All  the  keyword  constraints  used  in  the  language  must 
match  with  the  constraint  types  defined  in  a  meta-class  Constraint  Type  of  the  meta- 
model.  In  that  meta-class,  a  parameterized  rule(s)  is  used  to  define  the  semantics  of 
a  keyword  constraint  type.  For  example,  the  constraint  type  called  TOTAL  is  defined 
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by  a  parameterized  rule  which  formally  specifies  that  all  the  objects  of  the  underlying 
domain  of  an  attribute  have  to  be  the  attribute  values  of  some  entity  instances.  When 
a  schema  is  compiled,  the  rule(s)  is  bound  to  those  attributes  or  object  classes  that 
use  the  constraint  type.  If  a  new  constraint  type  and  its  semantic  specification  in 
terms  of  parameterized  rules  have  been  added  to  the  meta-model  by  the  knowledge 
base  customizer,  the  constraint  type  can  be  used  in  a  schema  declaration  without 
having  to  change  the  NCL  compiler. 

In  an  entity  class  declaration,  a  number  of  methods  and  knowledge  rules  can  be 
defined  which  are  used  for  processing  the  instances  of  the  entity  class.  The  method 
and  exception  specifications  are  borrowed  from  IDL.  However,  the  syntax  has  been 
changed  to  conform  to  the  syntax  of  K.3.  The  rule  specification  is  borrowed  from  K.3. 
Different  from  the  constraint  rules  of  EXPRESS,  which  are  used  for  specifying  inter- 
attribute  constraints,  rules  in  NCL  are  Event-Condition-Action-AlternativeAction 
rules  (or  ECAA  rules).  An  ECAA  rule  contains  1)  an  event  (or  Trigger)  specification, 
2)  a  condition  specification  which  may  involve  the  verification  of  a  complex  pattern  of 
object  interconnections  in  multiple  object  classes  and/or  a  complex  quantified  expres- 
sion that  involves  multiple  attributes  of  different  classes,  3)  an  action  specification 
which  specifies  a  list  of  system-defined  operations  (i.e.,  retrieval  and  manipulation 
operations)  and/or  user-defined  operations  (i.e.,  methods)  that  should  be  processed 
if  the  condition  is  evaluated  to  True,  and  4)  an  alternative  action  specification  whose 
operations  are  to  be  processed  if  the  condition  is  evaluated  to  False. 
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The  CAA  parts  of  a  rule  can  be  triggered  Before  or  Immediate-after  the  occur- 
rence of  a  trigger-operation  specified  in  the  event  part  of  the  rule,  or  After  a  trans- 
action is  ready  to  commit.  A  triggered  operation  may  in  turn  trigger  other  rules. 
The  automatic  enforcement  of  these  rules  makes  the  underlying  system  active.  It  has 
been  widely  recognized  that  the  concepts  and  techniques  of  agents  [FIN93,  GEN94, 
LAN92],  mediators  [WIE92,  WIE94,  WIE95]  and  negotiators  [LAN93,  MOE92]  are 
very  useful  for  achieving  information  and  program  sharing  in  a  complex,  heteroge- 
neous computing  environment.  The  rule  specification  mechanism  of  NCL  is  useful 
for  implementing  agents,  mediators  and  negotiators  in  such  an  environment  because 
their  implementations  can  make  use  of  the  active  capability  of  monitoring  events  and 
automatically  causing  some  intelligent  behaviors  to  be  carried  out.  It  should  also  be 
emphasized  that  the  rule  specification  language  of  NCL  is  far  more  powerful  than 
the  constraint  rule  of  EXPRESS  because  the  latter  does  not  have  the  event/trigger 
specification  capability  nor  the  specification  of  complex  conditions  which  involves  the 
processing  of  the  attributes  and  object  instances  of  multiple  classes.  Although  EX- 
PRESS allows  the  inclusion  of  functions  and  procedures  in  constraint  specifications 
(e.g.,  f(x)  =  2),  they  are  not  operations  or  methods  that  can  be  activated  based  on 
the  result  of  a  condition  evaluation. 

The  ASSOCIATION  specification  in  the  ENTITY  declaration  is  borrowed  from 
K.3.  EXPRESS  has  the  "Generalization"  association  in  the  form  of  SUPERTYPE/ 
SUBTYPE  construct  and  the  "Aggregation"  association  in  the  form  of  attributes; 
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however,  different  enterprises  may  have  the  need  to  model  different  types  of  ob- 
jects and  their  interrelationships.  For  example,  it  is  important  to  model  a  workflow 
in  terms  of  control  association  types  such  as  Sequential,  Parallel,  Synchronization, 
Dataflow,  etc.,  that  connect  the  activities  or  processes  in  a  workflow  model.  The  AS- 
SOCIATION specification  allows  different  association  types  to  be  defined  in  an  entity 
class.  In  the  example  schema,  INTERACTION  is  treated  by  the  compiler  as  an  iden- 
tifier which  matches  with  the  definition  of  this  association  type  in  the  meta-model. 
In  addition  to  the  Interaction  Association,  other  association  types  and  their  semantic 
properties  can  be  defined  by  the  knowledge  base  customizer  in  the  meta-model  so  that 
they  can  be  used  by  the  users  of  NCL.  Similar  to  class  types  and  keyword  constraint 
types,  all  association  types  are  defined  in  the  meta-model  in  terms  of  parameterized 
rules.  These  rules  are  converted  into  bound  rules  at  schema  compilation  time  and 
incorporated  into  the  object  classes  that  make  use  of  the  association  types.  At  run- 
time, these  rules  are  used  to  enforce  the  semantics  of  the  association  types.  This 
feature  of  association-type  extensibility  allows  any  semantic  relationships  between  or 
among  object  classes  (thus,  among  their  instances)  that  are  frequently  used  in  an 
application  to  be  defined  as  association  types.  Once  defined,  the  user  can  use  these 
association  types  to  relate  object  classes  without  having  to  repeatedly  specify  their 
rules  in  all  the  classes  that  involve  in  these  types  of  associations. 

Global  rules  are  those  rules  which  are  applicable  to  all  objects.  If  some  rules  are 
only  applicable  to  the  objects  of  some  classes,  they  can  be  defined  in  a  superclass  of 
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these  classes  and  be  inherited  by  them.  This  is  similar  to  the  definition  of  common 
attributes  and/or  methods  in  the  superclass  of  a  number  of  subclasses. 

The  method  implementation  provides  the  actual  program  code  for  implementing 
the  corresponding  method  specification.  It  can  be  coded  in  any  programming  lan- 
guage. In  our  work,  we  adopted  the  programming  language  constructs  of  K.3  in  NCL 
and  used  them  to  implement  methods.  Therefore,  with  the  definition  facilities  shown 
in  the  above  structural  declaration  and  the  programming  language  facilities  adopted 
from  K.3,  NCL  is  a  full-fledged,  high-level,  object-oriented  programming  language. 

4.3    Example  of  NCL  Schemas 

We  shall  use  an  example  to  show  how  the  NCL  is  used  to  model  component 
systems.  Due  to  our  emphasis  on  information  mediation,  the  example  given  below  is 
not  intended  to  reflect  the  full  capabilities  of  NCL. 

As  shown  in  Figure  4.2,  three  component  systems,  DB_1,  DB.2  and  DB_3,  con- 
tain related  stock  information.  Each  of  them  contains  an  ENTITY  class  related  to 
each  other.  In  DB.l,  ENTITY  class  Stock  has  three  attributes,  Date,  StkCode  and 
Tradeprice,  where  Date  and  StkCode  form  a  composite  key.  In  DB_2,  a  same-name 
ENTITY  class  Stock  also  has  three  attributes,  Date,  HP  and  IBM,  where  Date  is 
the  key  attribute.  Attributes  HP  and  IBM  contain  stock  prices  of  IBM  and  HP 
companies,  respectively.  In  DB_3,  ENTITY  class  IBM  contains  two  attributes,  Date 
and  StockPrice,  where  Date  is  the  key  attribute.  Their  NCL  schemas  are  shown  as 
follows: 
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Component  Schema  DB_1  Component  Schema  DB_2  Component  Schema  DB_3 

Figure  4.2.  Three  Component  Schemas  Containing  Stock  Information 


(*  The  Specification  of  component  system  DB_1  *) 

DEFINE  SCHEMA  DB_1; 

END_DEFINE; 


DEFINE  ENTITY  Stock  IN  DB_1; 

Date:  DATE; 

StkCode:  STRING; 

TradePrice:  STRING; 
UNIQUE 

Date,  StkCode; 
END_DEFINE; 


(*  The  Specification  of  component  system  DB_2  *) 

DEFINE  SCHEMA  DB_2; 

END_DEFINE; 


DEFINE  ENTITY  Stock  IN  DB_2; 

Date:  DATE; 

HP:  REAL; 

IBM:  REAL; 
UNIQUE 

Date ; 
END.DEFINE; 


(*  The  Specification  of  component  system  DB_3  *) 

DEFINE  SCHEMA  DB_3; 

END.DEFINE; 


DEFINE  ENTITY  IBM  IN  DB_3; 

Date:  DATE; 

StockPrice:  REAL; 
UNIQUE 

Date ; 
END_DEFINE; 
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The  three  component  schemas  represent  three  different  ways  that  components' 
data  resources  are  modeled.  In  schema  DB_1,  the  company  name  is  modeled  as  an 
attribute  value  in  the  attribute  StkCode;  however,  in  schemas  DB_2  and  DB_3,  it  is 
modeled  as  an  attribute  and  an  ENTITY  class  (i.e.,  IBM),  respectively.  Furthermore, 
the  representations  of  data  values  in  these  three  component  systems  are  also  different. 
The  data  representation  of  the  stock  price  in  DB_1  is  different  from  that  in  DB_2  and 
DB_3.  In  DB_1,  the  stock  price  is  represented  in  the  New  York  Stock  Exchange 
representation,  (e.g.,  6\08);  whereas,  in  DB_2  and  DB_3,  it  is  in  the  common  decimal 
representation  (e.g.,  6.5). 

This  example,  though  simple,  contains  not  only  naming  and  structural  conflicts 
but  also  representational  differences.  It  is  general  enough  to  illustrate  the  mediation 
process.  We  shall  use  this  example  throughout  the  rest  of  the  dissertation. 


CHAPTER  5 
MEDIATION  SPECIFICATION  LANGUAGE 


This  chapter  presents  a  Mediation  Specification  Language  (MSL)  to  capture  the 
mediation  information  for  resolving  various  data  heterogeneity  problems  and  to  sup- 
port system  extensibility.  Section  5.1  presents  a  categorization  of  data  heterogeneity 
(i.e.,  rationale  for  the  language).  Section  5.2  presents  the  language  by  describing 
its  design  rationale  and  syntactic  structure.  Section  5.3  gives  an  example  of  using 
the  language  to  mediate  the  component  schemas.  Section  5.4  describes  the  MSL's 
property  for  supporting  system  extensibility  and  scalability. 

5.1    Categorization  of  Data  Heterogeneity 

Data  heterogeneity  problems  have  been  thoroughly  discussed  in  several  publica- 
tions [BRE90,  CHAT91,  KIM91,  SU91,  VEN91,  HAM93,  CHAL94,  GOH94].  Based 
on  the  work  of  Goh,  Madnick  and  Siegel  [GOH94],  these  problems  are  categorized  as 
schematic  heterogeneity  and  semantic  heterogeneity. 

Schematic  heterogeneity  includes  two  types  of  problems:  naming  and  structural 
conflicts.  The  naming  conflicts  include  the  synonym  and  homonym  problems  on  both 
attribute  and  entity  type  names.  The  structural  conflicts  are  due  to  different  ways 
of  modeling  the  same  piece  of  information.  For  example,  in  Figure  5.1,  the  company 
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name,  "IBM",  can  be  used  as  an  entity  type  name,  an  attribute  name,  and  a  value 
of  an  attribute  in  different  systems. 


Datebase  1  (IBM  is  an  attribute  value) 


Database  2  (IBM  is  an  attribute  name) 


Date 

StkName 

TradePrice 

1/20/95 

IBM 

50.00 

1/20/95 

HP 

40.00 

Date 

IBM 

HP 

1/20/95 

50.00 

40.00 

Database  3  (IBM  is  an  entity  name) 
Entity  IBM  Entity  HP 


Date 

TradePrice 

Date 

TradePrice 

1/20/95 

50.00 

1/20/95 

40.00 

Figure  5.1.  Three  Examples  of  Showing  Structural  Conflicts 


Semantic  heterogeneity  is  due  to  different  representations  of  data  values.  It  in- 
cludes naming  and  other  representational  conflicts.  The  naming  conflicts  in  attribute 
values  are  seen  as  synonyms  (e.g.,  'IBM'  and  'IBM  Corp')  and  homonyms  (e.g.,  per- 
sons with  the  same  name).  Other  representational  conflicts  in  this  category  include: 
(1)  measurement  conflicts  (US  Dollar  vs.  Yen),  (2)  representation  conflicts  (New 
York  Stock  Exchange  representation  vs.  decimal  representation),  (3)  confounding 
conflicts  (e.g.,  latest  closing  price  vs.  latest  trade  price),  (4)  granularity  conflicts 
(e.g.,  monthly  pay  vs.  yearly  pay),  and  (5)  domain  type  conflicts  (e.g.,  numerical 
type  vs.  string  type). 

 Description  of  Mediation  Specification  Language  (MSL) 

The  naming,  structural,  and  representational  differences  of  the  component  sys- 
tems as  shown  in  the  example  of  Section  4.3  cause  some  problems  in  query  processing 
and  data  access.  Queries  processible  in  one  system  are  not  processible  in  another. 
Entity  and  attribute  names  as  well  as  data  values  need  to  be  converted  to  suit  the 
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other  system.  In  order  to  mediate  their  differences,  it  is  necessary  to  have  a  mediation 
specification  to  define  the  interrelationships  among  the  data  specifications  of  compo- 
nent systems  and  the  methods  to  resolve  them.  Instead  of  manually  hardcoding  the 
mediation  information  into  a  mediation  program,  the  approach  taken  in  this  work  is 
to  design  a  high-level  mediation  specification  language  to  capture  that  information. 

Mediation  Specification 


□       □  □ 


□  □ 


M 


M 


M 


□  □ 


□  □ 


Component  systems 


□  □ 


NCL 
Schema  1 


NCL 
Schema  2 


NCL 
Schema  3 


NCL 
Schema  N 


Figure  5.2.  Mediation  Specification  on  Top  of  N  Component  Schemas 


Figure  5.2  gives  an  overall  picture  of  how  the  mediation  specification  is  com- 
posed and  related  to  the  component  schemas.  In  the  figure,  the  specifications  of  N 
component  systems  are  modeled  as  NCL  schemas.  On  top  of  the  component  schemas, 
the  mediation  specification  language  is  used  to  specify  the  interrelationships  between 
related  concepts  (i.e.,  ENTITY  classes,  attributes  and  values)  and  the  methods  that 
reconcile  the  differences.  The  basic  idea,  also  found  in  [COL91],  is  to  create  mediation 
classes  to  link  related  ENTITY  classes  (i.e.,  M  associations)  based  on  the  mediation 
specification.  Attributes  of  the  related  ENTITY  classes  are  upward  inherited  by  the 
mediation  classes.  In  the  definition  of  each  mediation  class,  the  mappings  between 
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related  attributes  are  explicitly  specified  in  the  mediation  clauses.  In  addition,  the  se- 
mantic heterogeneities  of  data  values  are  resolved  by  specifying  the  proper  conversion 
methods  which  convert  data  values  from  one  representation  to  another. 

It  is  noted  that  the  mediation  class  which  mediates  related  ENTITY  classes 
is  not  intended  to  create  an  integrated  schema,  but  rather  a  mediated  one.  In  this 
mediation  system,  the  mediation  class  does  not  unify  the  names  of  its  upward  inher- 
ited attributes  but  contains  descriptive  clauses  to  mediate  them.  Therefore,  in  this 
hierarchy,  a  user's  view  of  data  based  on  each  component  system  is  still  preserved.  A 
user  would  issue  a  query  based  on  his/her  preferred  view  of  data  (i.e.,  use  the  nam- 
ing, syntactic  and  semantic  representations  of  the  schema  of  a  particular  component 
system).  If  additional  data  are  stored  in  other  component  systems  with  different 
representations,  the  query  needs  to  be  modified  to  conform  to  those  representations 
before  being  processed  by  the  component  systems.  The  data  returned  from  them 
will  have  to  be  converted  to  conform  to  the  user's  view.  Thus,  the  mediation  process 
needs  to  be  closely  coupled  with  distributed  query  processing. 

5.2.1    Design  of  MSL 

The  design  rationale  of  MSL  is  based  on  the  following  requirements: 

•  The  language  must  be  able  to  explicitly  specify  the  naming,  structural  and 
semantic  relationships  among  heterogeneous  systems  to  capture  different  types 
of  data  heterogeneities  as  listed  in  Section  5.1. 

•  The  syntax  of  the  language  must  be  high-level  and  easy  for  system  designers 
to  specify  mediation  information. 
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•  The  syntactic  constructs  should  conform  to  the  information  modeling  language 
(in  our  case,  NCL)  so  that  the  translation  from  the  mediation  specification  to 
mediation  rules  and  methods  of  the  information  modeling  language  can  be  done 
more  easily. 

5.2.2    MSL  Syntax  Structure 

The  overall  syntactic  structure  of  the  mediation  specification  language  is  shown 
below.  Its  complete  BNF  rules  are  listed  in  Appendix  B. 

SCHEMA  Med-Schema; 
USE  FROM  schema.l(entity.l,...); 
USE  FROM  schema_2(entity_2,...); 

ENTITY  super_entity_id 
ABSTRACT  MEDIATION_TYPE  OF  (mediation.type.expression); 
ENTITY  EQUIVALENCE  (sch_l::entity_l,sch_2::entity_2,...); 
ATTRIBUTE  EQUIVALENCE  [(sch_l::entity_l.attr_l,sch_2::entity.2.attr_2,...); 

|  (attr_set_l,  attr_set_2,...);] 
[VALUE  EQUIVALENCE((sch.l::entity_l.attr.l,conv_method(sch_2::entity_2.attr_2),...); 

|  (sch.l : :entity_l .attr_l ,conv_method(attr_set_2) , . . . ) ; 

);] 

[WHERE 

simple_condition; 
...  ] 

END  JENTITY; 
END  _SCHEMA; 

The  mediation  specification  language  conforms  to  the  standard  information 
specification  language,  EXPRESS  (a  part  of  NCL),  by  using  some  of  its  keywords 
and  syntax.  Two  main  constructs  of  MSL  borrowed  from  EXPRESS  are:  (1)  The 
USE  FROM  clause  declared  inside  a  schema  is  to  specify  the  interface  between  the 
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mediation  schema  and  other  component  schemas.  All  the  component  schemas  and 
their  related  ENTITY  classes  are  imported  by  listing  their  identifiers  in  the  clause. 
(2)  The  ABSTRACT  MEDIATION.TYPE  clause  (varied  from  the  ABSTRACT  SU- 
PERTYPE  clause)  in  an  ENTITY  class  definition  is  to  define  the  mediation  relation 
from  the  mediation  class  being  defined  to  a  set  of  related  ENTITY  classes  of  the 
component  schemas.  It  also  contains  a  mediation_type_expression  that  describes  the 
interrelationships  between/among  these  related  classes.  Relation  operators  used  in 
the  mediation_type_expression  include  ANDOR,  AND  and  ONEOF  which  represent 
the  constraints  of  Set-Intersection,  Set-Equality,  and  Set- Exclusion,  respectively.  The 
relationship  constraints  between/among  the  classes  can  contain  multiple  relation  op- 
erators. Four  additional  syntactic  constructs  of  MSL  are  introduced  and  explained 
below. 

•  The  ENTITY  EQUIVALENCE  clause  is  used  to  declare  the  equivalence  re- 
lationship between  two  or  more  entity  types  and  to  resolve  the  problem  of 
synonymous  entity  type  names.  Entity  type  names  enclosed  in  the  clause  are 
declared  to  be  synonyms  and  are  semantically  equivalent.  The  synonym  rela- 
tionships are  used  to  generate  code  to  do  query  modifications  (a  method  of  the 
Mediator)  by  the  mediation  language  compiler. 

•  The  ATTRIBUTE  EQUIVALENCE  clause  is  used  to  represent  the  synonymous 
relationship  among  a  set  of  attributes,  each  of  which  can  be  composite.  These 
attributes  have  the  same  meaning  and  their  values  are  inter-convertible.  The 
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clause  is  also  used  to  generate  part  of  the  implementation  code  for  query  modi- 
fication. The  information  of  attribute  name  mappings  is  embedded  in  the  code 
of  that  method. 

The  VALUE  EQUIVALENCE  clause  specifies  the  method  of  performing  data 
conversions  between  two  different  systems.  The  data  conversion  method  speci- 
fied in  the  clause  is  to  convert  data  of  one  attribute  or  a  set  of  attributes  into 
the  representation  of  another  attribute.  The  data  conversion  is  essential  to 
resolve  the  semantic  heterogeneity  problem,  which  may  involve  both  irregular 
(e.g.,  synonyms  in  data  values)  and  regular  (e.g.,  unit  or  measurement  conflict, 
etc.)  data  mappings.  The  synonym  problem  in  data  values  can  be  resolved  by 
coding  the  pairwise  mappings  of  equivalent  data  values  (e.g.,  the  value  'IBM'  in 
system  A  is  equivalent  to  the  value  'IBM  Corp'  in  system  B)  in  the  implemen- 
tation of  the  conversion  method.  Similarly,  mathematical  functions  (usually 
simple  ones)  can  be  embedded  in  the  conversion  method  to  deal  with  regular 
data  mappings.  The  implementation  of  conversion  methods  can  be  done  in 
NCL  or  other  programming  languages. 

The  WHERE  clause  following  the  ATTRIBUTE  EQUIVALENCE  clause  is  to 
resolve  both  the  homonym  problem  on  values  and  the  structural  conflict  be- 
tween two  systems.  Homonymous  values  are  identified  by  specifying  the  equal- 
ity conditions  of  key  attributes  in  the  WHERE  clauses.  The  attribute  values 
are  identical  only  when  the  key  attribute  values  are  the  same.  In  data  modeling, 
the  same  piece  of  information  can  be  modeled  in  different  ways,  which  causes 
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the  problem  of  structural  conflicts.  The  conversion  between  two  attribute  val- 
ues is  possible  when  a  special  condition  is  true  (e.g.,  an  entity  name  is  equal 
to  a  particular  attribute  value).  It  represents  a  conditional  mapping  relation- 
ship between  related  classes  and  is  needed  in  query  modification.  The  WHERE 
clause  is  also  used  to  specify  the  special  condition  for  the  mediation. 

It  is  noted  that,  by  default,  all  entity  and  attribute  names  defined  in  different 
systems  are  assumed  to  be  unrelated  even  though  they  are  the  same,  unless  ENTITY 
EQUIVALENCE  and  ATTRIBUTE  EQUIVALENCE  clauses  are  used  to  explicitly 
specify  their  synonymous  relationships.  The  homonym  problem  on  attribute  values 
is  handled  by  using  the  WHERE  clause  to  explicitly  specify  the  equality  condition 
of  key  attributes  as  explained  above. 

5.3   Example  of  Mediation  Specification 

The  following  specification  is  to  mediate  the  three  stock  component  schemas 
given  earlier: 

Schema  Mediation 


Stock_l_2_3 


Component  Schema  DB_1  Component  Schema  DB_2  Component  Schema  DB_3 


Figure  5.3.  Mediation  Schema  on  Top  of  Three  Component  Schemas 
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SCHEMA  Mediation; 
USE  FROM  DB_1  (Stock); 
USE  FROM  DB.2(Stock); 
USE  FROM  DB_3(IBM); 

ENTITY  Stock_l_2.3 
ABSTRACT  MEDIATION-TYPE  OF  (DB.l::Stock  ANDOR  DB_2::Stock 

ANDOR  DB_3::IBM); 
ENTITY  EQUIVALENCE  (DB_l::Stock,DB_2::Stock,DB.3::IBM); 
ATTRIBUTE  EQUIVALENCE  (DB_l::Stock.Date,DB_2::Stock.Date,DB_3::IBM.Date); 
ATTRIBUTE  EQUIVALENCE  (DB_l::Stock.TradePrice,DB_2::Stock.HP); 
VALUE  EQUIVALENCE( 
(DB_l::Stock.TradePrice,Decimal_to_NewYork(DB_2::Stock.HP)); 
(DB_2::Stock.HP,NewYork_to_Decimal(DB_l::Stock.TradePrice))); 
WHERE 

DB_l::Stock.StkCode  =  'HP'; 
(*  DB_l::Stock.TradePrice  and  DB_2::Stock.HP  are  equivalent  only  when 
DB_l::Stock.StkCode  is  equal  to  'HP'  *) 
ATTRIBUTE  EQUIVALENCE  (DB_l::Stock.TradePrice,DB.2::Stock.IBM, 

DB.3::IBM.StockPrice); 

VALUE  EQUIVALENCE  ( 

(DB_l::Stock.TradePrice,DecimaLtoJSfewYork(DB.2::Stock.IBM)); 
(DB_2::Stock.IBM,NewYork_toJDecimal(DB.l::Stock.TradePrice))); 
(DB_l::Stock.TradePrice,DecimaLtoJS[ewYork(DB_3::IBM.StockPrice)); 
(DB_3::IBM.StockPrice,NewYork_to.Decimal(DB.l::Stock.TradePrice))); 
WHERE 

DB.l::Stock.StkCode  =  'IBM'; 
(*  DB_l::Stock.TradePrice  is  equivalent  to  DB_2::Stock.IBM  and  DB.3::IBM.StockPrice 
only  when  DB_l::Stock.StkCode  is  equal  to  'IBM'  *) 
END  .ENTITY; 

END -SCHEMA; 


As  shown  in  Figure  5.3,  the  three  component  databases,  DB_1,  DB_2,  and  DB_3, 
contain  semantically  related  stock  information.  In  the  above  mediation  specifica- 
tion, the  USE  clauses  are  used  to  import  the  three  schemas  and  their  related  EN- 
TITY classes.  The  ENTITY  class,  Stock_l_2_3,  is  specified  as  a  mediation  class 
of  DB.1  "Stock,  DB_2::Stock  and  DB_3::IBM  with  two  assumed  pair- wise  ANDOR 
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constraints  of  EXPRESS  which  specify  that  objects  in  these  pairs  of  classes  can  over- 
lap. The  attributes  of  the  three  related  ENTITY  classes  are  upward  inherited  by 
Stock_l_2_3.  Then,  inside  the  definition  of  the  ENTITY  Stock_l_2_3,  different  me- 
diation clauses,  such  as  ENTITY  EQUIVALENCE,  ATTRIBUTE  EQUIVALENCE, 
VALUE  EQUIVALENCE  and  WHERE,  are  used  to  specify  the  mediation  informa- 
tion needed  to  reconcile  their  schematic  and  semantic  conflicts. 

For  example,  the  attributes  DB_l::Stock.TradePrice  and  the  attribute 
DB_2::Stock.HP  are  equivalent  when  the  value  of  the  attribute  DB_l::Stock.StkCode 
is  equal  to  'HP'.  This  information  is  specified  in  the  ATTRIBUTE  EQUIVALENCE 
clause  with  the  WHERE  condition,  DB_l::Stock.StkCode  =  'HP'.  The  data  of  the 
two  equivalent  attributes  are  convertible  to  each  other  by  using  two  simple  con- 
version methods  as  specified  in  the  VALUE  EQUIVALENCE  clause.  For  example, 
DB_l::Stock.TradePrice  can  be  converted  into  DB_2::Stock.HP  by  using  the  conver- 
sion method,  NewYork_to_Decimal. 

5.4    Scalable  Multilevel  Mediation 

In  a  heterogeneous  information  system,  component  systems  which  provide  data 
or  services  may  change  over  time:  old  component  systems  become  obsolete  and  are 
removed  and  new  component  systems  that  provide  more  up-to-date  information  or 
better  services  are  added.  This  suggests  the  need  of  system  extensibility.  Further- 
more, to  achieve  a  large  amount  of  information  exchange  and  sharing,  the  number 
of  component  systems  can  become  rather  large.  In  this  case,  scalability  becomes  a 
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critical  requirement.  The  traditional  schema  integration  approach  is  not  suitable  be- 
cause a  potentially  large  amount  of  data  heterogeneities  among  different  component 
systems  can  impede  the  integration  process  and  become  unmanageable. 

5.4.1    Multi-level  Mediation  Hierarchies 

The  requirements  for  extensibility  and  scalability  have  motivated  our  work  to 
form  a  multi-level  mediation  hierarchy  by  which  both  requirements  can  be  met.  The 
mediation  specifications  that  capture  lower-level  mediation  information  can  be  reused 
to  generate  the  upper-level  mediation  specifications.  The  basic  idea  is  similar  to  the 
upward  inheritance  of  attributes  by  a  mediation  class.  The  mediation  specifications 
can  also  be  upward  "inherited"  in  such  a  hierarchy.  Since  the  mediation  specifications 
already  capture  the  mediation  information  for  mediating  the  component  systems  in 
a  certain  scope,  it  can  be  reused  to  generate  the  mediation  specification  of  a  larger 
scope  of  component  systems.  This  reusable  property  of  the  mediation  specification 
greatly  reduces  the  effort  and  cost  of  composing  upper-level  mediation  specifications. 

Figure  5.4  illustrates  the  idea  of  reusing  the  mediation  specifications.  The  medi- 
ation specifications  defined  at  lower  levels  are  reused  and  combined  with  the  top-level 
mediation  specification  to  generate  the  new  mediation  elements  to  perform  the  medi- 
ation tasks  in  a  broader  scope  (i.e.,  containing  more  component  systems).  In  Figure 
5.4,  four  component  systems  named  DELI,  DB_2,  DB.3  and  DB_4  are  at  the  level  0 
of  the  mediation  hierarchy.  Each  component  system  contains  two  ENTITY  classes 
which  are  related  to  the  ENTITY  classes  of  other  component  systems.  Then,  in 
level  1,  the  component  systems  are  mediated  by  two  mediation  specifications,  1A 
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Figure  5.4.  Multi-level  Mediation:  Build-time  Hierarchy 

(for  DB_1  and  DB_2)  and  IB  (for  DB_3  and  DB.4).  Each  mediation  specification  can 
be  compiled  to  generate  the  mediation  elements  that  perform  the  mediation  process 
in  its  scope.  Suppose  that  the  mediation  elements  for  mediating  the  four  compo- 
nent systems  is  to  be  created.  The  two  mediation  specifications  (1A  and  IB)  can 
be  reused  and  combined  with  the  new  mediation  specification,  2 A,  which  mediates 
the  two  separate  sets  of  component  systems.  The  advantage  of  this  strategy  is  that 
redundant  mediation  specifications  can  be  avoided.  It  is  noted  that,  since  the  com- 
pilation approach  is  taken,  the  compiler  of  the  mediation  language  is  designed  to 
combine  multiple  mediation  specifications  and  generate  the  mediation  elements  that 
deal  with  a  broader  scope  of  information  mediation. 

Figure  5.5  shows  the  mediation  hierarchy  after  generating  three  sets  of  mediation 
elements  1A,  IB  and  2A.  At  run-time,  the  DQP  of  each  set  of  mediation  elements 
works  independently  to  receive  and  process  queries  in  a  different  scope.  The  dotted 
arrow  lines  show  the  method  calls  invoked  directly  from  the  DQP  to  the  SQPs  of  the 
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Figure  5.5.  Multi-level  Mediation:  Run-time  Hierarchy 

component  systems  (i.e.,  SQPqx)-  The  method  calls  are  then  trapped  by  mediation 
rules  to  invoke  local  mediators  to  perform  mediation  operations. 

In  this  hierarchy,  multiple  sets  of  mediation  elements  are  generated  to  process 
queries  in  different  scopes  of  component  systems.  We  believe  that  this  scoped  infor- 
mation access  has  a  significant  merit  because,  in  most  cases,  the  users  may  not  be 
interested  in  retrieving  data  from  all  the  information  sources  of  the  entire  mediation 
system.  This  multi-level  mediation  hierarchy  can  support  different  levels  of  grouping 
component  systems  and  their  data  resources.  In  the  real  world  situation,  the  com- 
ponent systems  providing  data  are  usually  grouped  by  certain  criteria.  For  example, 
information  sources  in  the  same  geographical  area  can  be  grouped  together  and  me- 
diated by  a  set  of  mediation  elements.  Then  information  sources  of  neighboring  areas 
can  further  be  grouped  to  form  a  larger  mediation  scope.  Queries  for  retrieving  data 
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in  a  particular  scope  can  then  be  submitted  to  the  relevant  mediation  elements  for 
processing. 

Furthermore,  this  multi-level  mediation  hierarchy  can  provide  better  system 
availability.  Compared  with  a  centralized  mediation  system  in  which  any  changes 
would  cause  the  interruption  of  mediation  services,  changes  made  to  the  multi-level 
mediation  hierarchy  only  affect  some  of  its  distributed  mediation  elements.  All  others 
can  still  process  queries  in  their  mediation  scopes. 

5.4.2    Adding  or  Deleting  Component  Systems  and  Mediation  Specifications 

In  the  multilevel  mediation  hierarchy,  adding  or  deleting  component  systems  is 
done  by  changing  the  mediation  specifications  which  have  been  affected  and  then 
recompiling  them  to  generate  new  mediation  elements.  At  build-time,  the  mediation 
hierarchy  is  a  tree  structure  in  which  terminal  nodes  are  component  systems  and  non- 
terminal nodes  are  mediation  specifications.  Any  addition  or  deletion  of  component 
systems  will  cause  changes  to  be  made  in  some  nodes  of  the  tree. 

Generally,  there  are  three  alternatives  of  adding  new  component  systems.  To 
illustrate  the  three  alternatives,  we  shall  use  an  example.  As  seen  in  Figures  5.6, 
5.7  and  5.8,  a  three-level  mediation  hierarchy  (i.e.,  uncircled  part)  has  existed  before 
appending  new  component  systems.  In  the  lowest  level,  there  are  six  component 
schemas  and  each  of  them  contains  an  ENTITY  class  (i.e.,  Loi-Loe).  The  six  ENTITY 
classes  are  related  and  mediated  by  the  two  mediation  specifications,  MSpecn  and 
MSpeci2,  with  the  mediation  classes  Ln  and  L\2,  respectively.  On  top  of  the  two 
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mediation  classes,  another  mediation  class  L21  is  defined  to  mediate  the  two  groups 
of  component  systems. 

Based  on  the  example,  three  component  systems  (each  represented  as  a  dotted 
rectangle)  are  added  and  each  contains  an  ENTITY  class  (i.e.,L07>  L0$  and  L09) 
related  to  the  existing  ones.  The  addition  of  the  new  component  systems  results  in 
three  possible  types  of  configurations: 

(1)  Appending  new  component  systems  to  an  existing  mediation  specification:  In 
this  case,  new  component  systems  are  directly  inserted  under  a  node  of  an  existing 
mediation  specification.  As  shown  in  Figure  5.6,  the  three  new  ENTITY  classes 
are  inserted  under  the  node  of  mediation  specification,  MSpec^-  To  add  the  new 
component  systems,  the  mediation  specification  MSpecyi  needs  to  be  changed. 


Figure  5.6.  Appending  New  Component  Systems  to  an  Existing  Mediation  Specifi- 
cation 

(2)  Creating  a  new  non-root  mediation  specification:  This  alternative  creates  a 
new  mediation  specification  to  mediate  the  new  component  systems  and  forms  a  sub- 
tree. Then,  the  subtree  is  inserted  somewhere  in  the  existing  mediation  hierarchical 
tree.  Figure  5.7  depicts  this  case  of  creating  a  new  mediation  specification  MSpecw 
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to  mediate  the  three  new  component  systems  and  the  subtree  with  the  root  L\z  is 
inserted  under  the  mediation  specification  MSpec-n  ■ 
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Figure  5.7.  Creating  a  New  Non-root  Mediation  Specification 


Figure  5.8.  Creating  a  New  Root  Mediation  Specification 


(3)  Creating  a  new  root  mediation  specification:  The  last  alternative  is  to  create 
a  new  mediation  specification  on  top  of  the  existing  mediation  hierarchical  tree  and 
the  new  component  systems.  As  shown  in  Figure  5.8,  a  new  mediation  specification, 
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MSpeczi ,  is  created  to  mediate  the  existing  mediation  tree  with  the  root  L2i  and  the 
three  new  component  systems. 

It  is  noted  that  the  first  two  alternatives  may  require  some  changes  on  the 
mediation  specifications  in  the  path  from  the  new  component  systems  to  the  root, 
as  shown  by  the  thick  lines  in  Figures  5.6  and  5.7.  Each  mediation  specification  in 
the  path  should  be  modified  only  if  any  entity  classes  of  the  new  component  systems 
relate  to  the  entity  classes  of  its  existing  component  systems;  otherwise,  no  change 
is  needed. 

On  the  other  hand,  the  deletion  of  component  systems  is  done  by  modifying  the 
mediation  specifications  in  the  path  of  the  mediation  hierarchical  tree.  All  mediation 
clauses  containing  the  entity  classes  to  be  removed  should  be  modified  properly.  Since 
the  mediation  specification  language  is  high-level  in  its  representation,  it  is  easier  to 
understand,  modify  and  maintain  to  meet  the  change  requirements  than  hard-coded 
mediator  (s). 


CHAPTER  6 
SYSTEM  IMPLEMENTATION 


This  chapter  presents  the  implementation  of  the  mediation  system  which  uses 
the  two  languages  described  in  Chapters  4  and  5.  To  illustrate  the  build-time  process 
of  the  mediation  system,  the  order  of  presentation  is  reversed  from  the  order  of 
describing  the  two  languages.  The  implementation  of  the  MSL  translator  is  described 
first  followed  by  the  implementation  of  the  NCL  compiler. 

6.1    MSL  Translator 

The  MSL  translator  translates  a  mediation  specification  into  a  set  of  mediation 
elements  for  supporting  run-time  mediation  and  query  processing.  We  shall  describe 
the  generation  of  the  mediation  elements  and  give  some  generated  examples  below. 

6.1.1    Generation  of  Mediation  Elements 

The  MSL  translator  takes  a  mediation  specification  containing  descriptive  me- 
diation clauses  as  its  input.  The  mediation  specification  is  first  syntactically  and 
semantically  checked,  and  then  produces  a  parse  tree  structure  of  its  original  repre- 
sentation. By  making  use  of  this  tree  representation,  the  MSL  translator  generates 
a  set  of  mediation  elements,  as  depicted  in  Figure  6.1.  The  generated  mediation 
elements  are  described  below. 
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Mediation  Specification 


Mediation  Language 
Translator 


NCL  code  for  method 
implementations  of 
Mediator  and  Query 
Processor 


Mediation 
classes  of 
Entity  classes 
in  component 
systems 


Distributed  QueryProcessor 
Object  Class 


Subquery  Processor 
Object  Classes 


Mediator  Object  Classes 


(M)  global  query  execution     (M)  query  execution  (M)  query  modification 

(R)  rule_query_propagation    (R)  rule_query_modification (M)  data  conversion 
(R)  rule_data_conversion 


Figure  6.1.  Compilation  of  Mediation  Specification 

Mediation  classes:  These  mediation  classes  mediates  the  ENTITY  classes 
of  different  component  systems  if  they  contain  semantically  related  objects. 
The  specifications  of  these  mediation  classes  also  capture  the  set  member- 
ship constraints  among  the  related  ENTITY  classes,  such  as  Set-Equality,  Set- 
Exclusion,  or  Set-Intersection.  This  information  is  useful  for  query  optimization 
(e.g.,  if  two  component  systems  contain  identical  objects  and  their  data  are  the 
same,  then  there  is  no  need  to  send  a  query  to  both  systems). 

Mediator  object  classes:  These  classes  model  the  mediators  which  are  gen- 
erated based  on  the  mediation  specifications  associated  with  pairs  of  related 
component  schemas.  For  the  sake  of  query  processing  efficiency,  the  Mediator 
classes  are  distributed  and  linked  with  different  component  systems  at  differ- 
ent sites  so  that  mediation  operations  can  be  processed  locally.  Each  Mediator 
contains  the  methods  that  actually  perform  query  modification  and  data  conver- 
sion. These  methods  are  invoked  by  the  Subquery  Processor  of  the  component 
system  when  mediation  is  needed. 
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•  Distributed  Query  Processor  (DQP)  object  class:  The  Distributed  Query 
Processor  class  contains  a  method  which  performs  distributed  query  process- 
ing in  the  heterogeneous  system.  This  method  with  a  query  as  its  parame- 
ter is  invoked  by  the  user/application  program.  The  implementation  of  this 
method  is  based  on  a  built-in  query  processing  algorithm.  The  DQP  class  has 
an  ECAA  mediation  rule  which,  when  triggered,  calls  a  knowledgebase  man- 
agement system  (OSAM*.KBMS  [SU95])  to  obtain  the  meta  data  for  locating 
the  information  sources  and  propagating  subqueries. 

•  Subquery  Processor  (SQP)  object  classes:  These  classes  are  Subquery 
Processors  generated  for  the  component  systems.  Each  SQP  receives  a  subquery 
from  the  DQP  and  calls  the  Mediator,  if  a  subquery  conversion  is  needed,  to 
generate  a  mediated  subquery.  The  SQP  then  sends  the  mediated  subquery  to 
the  wrapper  which  converts  it  into  a  native  query,  command  or  API  processable 
by  the  component  system.  The  SQP  object  class  contains  a  main  method  which 
triggers  two  mediation  rules.  One  is  to  modify  the  subquery  to  conform  to  the 
naming  and  structural  convention  of  the  component  system  before  the  main 
method  is  executed.  The  other  is  to  convert  the  returned  data  from  its  local 
representation  to  the  one  expected  by  the  user  after  the  main  method  finishes 
its  execution. 

•  Mediation  methods:  Based  on  the  mediation  specification  (i.e.,  ENTITY 
EQUIVALENCE,  ATTRIBUTE  EQUIVALENCE,  VALUE  EQUIVALENCE 
and  WHERE),  the  implementation  code  for  the  mediation  methods,  including 
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query  modification,  data  conversion,  etc.,  can  either  be  generated  automatically 
or  provided  by  the  system  administrator. 


All  the  above  mediation  elements  including  DQP,  SQPs,  distributed  Mediators, 
mediation  rules,  and  method  implementations  are  generated  and  uniformly  repre- 
sented in  NCL.  The  NCL  code  is  then  translated  into  executable  code  (C/C++)  by 
an  NCL  compiler.  The  next  section  will  give  some  examples  describing  the  gener- 
ations of  the  Mediator  object  classes  and  the  mediation  rules  associated  with  DQP 
and  SQPs  of  component  systems. 

6.1.2    Examples  of  Generated  Mediation  Elements 

Generation  of  Mediators 

Each  mediator,  linked  to  a  component  system,  provides  pairwise  conflict  reso- 
lution between  the  component  system  and  other  component  systems.  The  following 
mediator  class  of  the  component  system  DB_1  is  a  part  of  the  output  generated  from 
the  mediation  specification  given  in  Chapter  5. 


DEFINE  ENTITY  Mediator  IN  DB_1; 
METHODS : 

METHOD  modify_query(INOUT  ps : Query) : VOID; 

METHOD  data_conversion  (INOUT  result:SET  OF  Data):V0ID; 

METHOD  convertUNOUT  data : Data)  : VOID; 

METHOD  change_name_add_cond(IN  s_name : STRING ; IN  t_name : STRING ; 

INOUT  ps: Query;  IN  condition : STRING) : VOID; 
METHOD  change_name_rm_cond(IN  s.name: STRING; IN  t_name: STRING; 

INOUT  ps: Query;  IN  condition : STRING) : VOID; 
METHOD  NewYork_to_Decimal (INOUT  data : Data) : VOID; 
END.DEFINE; 

METHOD  Mediator: :modify_query( INOUT  ps : Query) : VOID; 
IF  (ps . s_schema=  'DB_2'  AND  ps.t_schema=  'DB_1') 
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THEN 

change _name_add_cond  ('IBM' , 'TradePrice' ,ps, 'StkCode="IBM'") ; 
change _name_add_cond  ('HP' , 'TradePrice' ,ps, 'StkCode="HPM ') ; 
END.IF; 

IF  (ps . s_schema=  'DB_3'  AND  ps.t_schema=  'DB_1') 
THEN 

change_name_add_cond  ('StockPrice' , 'TradePrice' ,ps, 'StkCode="IBM'") ; 
change _name_rm_cond  ('IBM' , 'Stock' ,ps,  ") ; 
END_IF; 
END_METHOD; 

METHOD  Mediator: :data_conversion(INOUT  result: SET  OF  Data): VOID; 
LOCAL 

num_column,  i:  INTEGER; 
END.LOCAL; 

num_column  :=  SIZEOF(result) ; 
FOR  i=l  UNTIL  i  >  num_column  BY  i=  i+1  DO 
IF  (result [i] .s_attr  <>  result [i] .t_attr) 
THEN 

convert  (result [i] ) ; 
END.IF; 
END_F0R; 
END_METHOD ; 

METHOD  Mediator: : convert (INOUT  data : Data) : VOID; 
LOCAL 

i:  INTEGER; 
END_L0CAL; 

FOR  i=l  UNTIL  i  >  data. size  BY  i=  i+1  DO 

IF  (data.s_attr  =  'DB_1. Stock. TradePrice'  AND 
data.t_attr  =  'DB_2. Stock. HP') 

THEN 

NewYork_to_Decimal(data. value_set [i] ) ; 
END_IF; 

IF  (data.s_attr  =  'DB_1. Stock. TradePrice'  AND 
data.t.attr  =  'DB_2. Stock. IBM') 

THEN 

NewYork_to_Decimal(data.value_set [i]  ) ; 
END_IF; 

IF  (data.s_attr  =  'DB_1 .Stock. TradePrice'  AND 
data.t_attr  =  'DB_3.IBM.StockPrice') 

THEN 

NewYork_to_Decimal(data.value_set [i] ) ; 
END_IF; 
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END_FOR; 
END.METHOD ; 


(*  Common  data  structures  of  query  and  returned  data  *) 

(*  object  class  'Query'  for  storing  query  statement  and  data  result  *) 

DEFINE  ENTITY  Query; 

query _node:  Query _node;  (*  query  node  *) 

s_schema:  STRING;  (*  source  schema  name  *) 

t .schema:  STRING;  (*  target  schema  name  *) 

result:  SET  OF  Data;  (*  query  result  *) 

END.DEFINE; 

(*  Object  class  'Data'  is  defined  for  standard  data  interchange  *) 
DEFINE  ENTITY  Data; 

(*  The  first  four  are  attribute-value  information  returned  by  the  component 
system.    The  last  one  is  stored  by  the  mediator  to  specify  the  target 
attribute  *) 

value_set:  SET  OF  Generic_Data_Type;  (*  result  values  *) 
size:  INTEGER;  (*  size  of  SET  *) 

data_type:  STRING;  (*  data  type  *) 

s_attr:  STRING;  (*  attribute  name  in  source  schema  *) 

t_attr:  STRING;  (*  attribute  name  in  target  schema  *) 

END.DEFINE; 

(*  Generic  Data  Type:  a  union  of  all  possible  data  types  *) 
DEFINE  TYPE  Generic _Data_Type= 

SELECT  OF  (string.type ,  number.type,  boolean.type ,  Data); 

END.DEFINE; 


The  above  generated  class  and  method  definitions  make  reference  to  two  data 
structures  for  recording  the  query  and  data.  They  are  modeled  by  two  entity  classes, 
Query  and  Data.  A  TYPE  class,  Generic_Data_Type  is  also  used.  The  Query  class 
records  the  tree  structural  representation  of  the  query  (in  the  attribute  query _node), 
its  original  schema/view  (in  the  attribute  s_schema)  and  the  schema  identifier  (in 
the  attribute  t_schema)  of  the  target  system  that  will  process  the  query.  The  data 
received  after  query  processing  is  stored  as  a  set  of  Data  objects  (in  the  attribute 
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result).  The  Data  class  is  to  record  a  set  of  returned  values  of  a  particular  attribute 
of  the  source  schema  (i.e.,  a  column  of  returned  data).  It  also  contains  the  at- 
tribute name  of  the  target  schema.  If  the  source  and  target  attributes  are  different, 
their  query  modification  and  data  conversion  need  to  be  carried  out.  The  TYPE 
Generic_Data_Type  is  a  union  structure  of  all  the  data  types  of  the  returned  values. 

The  object  class,  Mediator,  contains  several  methods  including  modify  .query  (), 
data_conversion()  and  convertQ,  etc.  The  first  two  methods  are  the  main  meth- 
ods to  perform  query  modification  and  data  conversion,  respectively.  The  method, 
convert(),  is  a  subroutine  of  data_conversion(). 

As  seen  in  the  method  implementation  code  of  query _modification(),  the  query 
is  modified  by  first  matching  its  source  and  target  schema  identifiers  to  decide  which 
set  of  query  modification  operations  is  to  be  taken.  This  query  modification  method 
is  generated  from  the  specifications  of  the  ENTITY  EQUIVALENCE,  ATTRIBUTE 
EQUIVALENCE  and  WHERE  clauses.  Since  the  query  is  passed  as  a  query  tree, 
the  query  modification  is  done  by  operations  on  the  nodes  of  the  tree  (i.e.,  deletion, 
insertion,  substitution,  etc.).  Due  to  possible  structural  conflicts  between  component 
schemas,  query  modification  may  not  be  done  by  direct  name  substitutions  in  tree 
nodes.  The  method  change_name_add_cond()  (its  implementation  is  not  shown)  is 
called  for  the  purpose  of  either  conditional  or  unconditional  name  changes.  If  the 
condition  parameter  (i.e,  the  fourth  parameter)  is  passed  with  a  value,  the  query 
modifier  not  only  changes  names  but  also  adds  the  condition  expression  to  the  query 
tree.  For  example,  if  the  source  schema  identifier  is  lDB_2'  and  the  target  schema 
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identifier  is  'DB_1',  the  attribute  name  'IBM'  in  the  query  is  replaced  by  'Trade- 
Price'  and  the  condition  expression  'StkCode="IBM"\  is  added.  Another  method 
change_name_rm_cond()  (the  implementation  is  not  shown)  modifies  the  query  by 
changing  names  and  also  removing  the  condition  expression  from  the  query. 

For  data  conversion,  the  method  convert ()  is  called  when  a  data  conversion  is 
required.  It  compares  the  source  and  target  attribute  names  in  the  Data  object 
with  other  pairs  of  attribute  names.  The  generation  of  the  data  conversion  method 
is  based  on  the  specifications  of  the  ATTRIBUTE  EQUIVALENCE  and  VALUE 
EQUIVALENCE  clauses.  If  a  pair  matches  the  names,  a  data  conversion  method  is 
invoked  to  perform  the  data  conversion  on  each  of  the  data  values  in  the  column. 
For  example,  if  the  source  attribute  name  is  'DB_1. Stock. TradePrice'  and  the  target 
attribute  name  is  'DB_2.Stock.HP',  the  method  NewYork_to_Decimal()  is  invoked  to 
convert  the  data.  The  method  has  to  be  coded  by  a  program.  It  can  not  be  generated 
automatically. 

Generation  of  Distributed  and  Subquery  Processors  and  Mediation  Rules 

The  distributed  query  processor  (DQP)  is  a  component  system  which  receives 
and  processes  global  queries  against  some  or  all  component  systems.  Based  on  the 
mediation  specification,  the  generated  class  definitions  of  the  DQP  and  the  subquery 
processor  SQP  are  shown  below: 

(*  Definition  of  object  class  'Distributed  Query.Processor 1  *) 

DEFINE  ENTITY  DQP  IN  DQP_SCHEMA; 

global_query :  Query;  (*  declaration  for  global  query  *) 

subqueries:  SET  OF  Query;  (*  declaration  for  subqueries  *) 

prop_subqs:  SET  OF  Query;  (*  for  propagated  subqueries  *) 
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prop_ index:  INDEX;  (*  index  of  propagated  subqueries  *) 

METHODS : 

(*  global_query_execution()  is  the  entry  point  *) 

METHOD  global_query_execution(IN  query_txt : STRING)  :SET  OF  Data; 

METHOD  query_initialization(IN  query_txt : STRING) :  VOID; 
METHOD  syntax_semantic_checking() :  BOOLEAN; 
METHOD  error_handler() :  VOID; 

METHOD  subquery_generation(IN  query_txt : STRING) :  VOID; 
METHOD  decompose_query(IN  query _txt : STRING) :  SET  OF  Query; 
METHOD  dispatch_subquery() :  VOID; 

METHOD  assemble_mediated_results(INOUT  q:  Query;  IN  subqs:  SET  OF  Query) 

:  VOID; 

METHOD  result_merge_join() :  VOID; 
METHOD  query_execution() :  VOID; 
RULES : 

RULE  rule_query .propagation; 

TRIGGER  IMMEDIATE  AFTER  subq_initialization(s) 
ACTION 

prop_subqs:=  KBMS: :propagate_queries(s) ; 
END_RULE ; 
END.DEFINE; 

METHOD  DQP: :global_query_execution  (IN  query.txt:  STRING): 

SET  OF  Data; 

LOCAL 

s:  Query; 

ps:  Query; 
END_LOCAL ; 

query_initialization(query_txt) ; 

IF  (syntax_semantic_checking()  ==  FALSE) 

THEN 

error_handler() ; 

END_IF; 

subquery_generation(global_query . query.stmt) ; 

FOREACH  (s: subqueries) 
subq_initialization(s) ; 
FOREACH  (ps :prop_subqs) 

dispatch_subquery(ps) ; 
END_FOREACH ; 

assemble_mediated_results (s ,prop_subqs) ; 
END_FOREACH ; 


result_merge_join() ; 
RETURN  (global.query. result) ; 
END.METHOD; 
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(*  Component  Schema  of  System  DB_1  *) 

DEFINE  SCHEMA  DB_1; 

END.DEFINE; 

DEFINE  ENTITY  SQP  IN  DB_1; 

m:  Mediator; 
METHODS : 

METHOD  query_execution(INOUT  q_obj :  Query_obj):  VOID; 
RULES : 

RULE  rule_query_modif ication; 
TRIGGER  BEFORE  query_execution(q_obj ) 
CONDITION  q_obj . s.schema  <>  q_obj .t_schema 
ACTION 

m.modif y_query(q_obj) ; 
END_RULE; 

RULE  rule_data_conversion; 

TRIGGER  IMMEDIATE  AFTER  query_execution(q_obj ) 
ACTION 

m.data_conversion(q_obj .result) ; 
END_RULE ; 
END_DEFINE; 


The  DQP  class  models  the  distributed  query  processor.  Its  main  method  is 
global_query_execution().  By  calling  this  method,  a  global  query  is  submitted,  syntac- 
tically checked  and  decomposed  into  subqueries  if  the  original  query  requires  the  pro- 
cessing of  data  residing  in  multiple  component  systems.  Each  subquery  is  dispatched 
to  the  SQP  of  a  component  system  by  invoking  the  main  method,  query  .execution. 
After  subqueries  finish  their  executions,  returned  data  are  assembled,  merged  and 
joined  to  produce  the  final  result  of  the  original  query. 

To  perform  run-time  mediation,  three  ECAA  mediation  rules  are  generated  to 
mediate  the  distributed  query  processing.  They  are  rule_query .propagation,  rule.que- 
ry .modification  and  rule.data.conversion.  The  first  rule,  rule.query .propagation,  is 
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triggered  by  the  DQP  immediately  after  each  subquery  is  initialized  and  ready  for 
dispatching  to  the  query  server  of  a  component  system.  The  action  part  of  the  rule 
contains  a  call  to  a  method  of  the  KBMS,  propagate_queries(),  to  get  the  mediation 
information  which  relates  the  data  referenced  by  the  subquery  to  the  data  stored  in 
other  component  systems.  The  subquery  and  the  schema  identifiers  of  these  compo- 
nent systems  are  entered  into  the  data  structure  defined  in  the  entity  class  Query.  The 
purpose  is  to  replicate  the  subquery  into  as  many  subqueries  as  there  are  component 
systems  that  contain  related  data.  However,  at  this  stage,  the  replicated  subqueries 
may  not  be  ready  for  processing  because  the  names  and  structures  of  the  subqueries 
may  not  be  recognized  by  the  component  systems  that  will  process  them.  The  sec- 
ond rule,  rule.query .modification,  associated  with  the  SQP  of  a  component  system, 
is  triggered  before  each  subquery  is  executed.  It  checks  to  see  if  the  source  and  target 
schemas/views  are  different.  If  they  are  different,  the  method  query  .modification  of 
the  Mediator  is  called  to  convert  the  subquery  into  the  representation  suitable  for  the 
target  system.  Additionally,  for  data  conversion  purposes,  this  method  also  inserts 
the  attribute  names  used  in  both  the  source  and  target  schemas  for  each  data  object 
to  be  returned  by  the  subquery.  After  the  subquery  is  processed,  the  data  retrieved 
by  the  subquery  are  returned  and  the  after-rule,  rule_data_conversion  (also  associated 
with  the  SQP  of  the  component  system),  is  triggered.  This  rule  enforces  the  action  to 
check  each  column  of  data  to  see  whether  the  data  conversion  is  needed  by  comparing 
the  attribute  names  in  the  source  and  target  schemas/views.  If  a  naming  difference 
exists,  a  data  conversion  is  performed. 
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6.2    NCL  Compiler 


The  above  generated  mediation  elements  (i.e.,  mediation  classes,  DQP,  SQPs 
and  Mediators,  etc.)  and  a  collection  of  NCL  component  schemas  combine  to  form  a 
mediated  global  schema.  The  mediated  schema  is  further  compiled  by  the  NCL  com- 
piler to  generate  program  code  which  is  distributed  and  linked  in  the  CORBA-based 
client/server  communication  environment.  Figure  6.2  depicts  the  general  build-time 
procedure  of  compiling  the  mediated  global  schema  into  C/C++  program  bindings. 

Mediation  Elements   

NCL  component  schemas 


Implementation  code       Mediation  DqP  ^  SqP  classes 

r  Classes        Mediator  Classes  r,,i„c 


of  mediation  methods 
(NCL) 


and  mediation  rules 


NCL  Compiler 


NCL-to-K.3 
Translator 


~(  K.3  Compiler  ) 
(  IDL  Compiler  ) 


Meta-information 
(KBMS) 


C/C++  code 
for  ECAA  rules 
and  mediation 
implementation 


r 


C/C++  Skeleton 
(Server) 


C/C++  Stub  client,  program 
(Client) 


Execution  Code 
for  Server 


Execution  Code 
for  Client 


Figure  6.2.  Compilation  of  a  Mediated  Global  Schema 


As  shown  in  Figure  6.2,  the  NCL  compiler  consists  of  three  parts.  First,  by  using 
the  OSAM*.KBMS  as  the  underlying  supporting  system,  an  NCL-to-K.3  translator 
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translates  the  NCL  schema  into  the  corresponding  K.3  specification.  K.3  is  the 
third  version  of  a  knowledge  base  programming  language  developed  at  the  Database 
Systems  Research  and  Development  Center  [ARR97].  Second,  the  generated  K.3 
specification  is  then  compiled  by  the  K.3  compiler  to  generate  IDL  specifications 
and  C/C++  program  code.  Third,  an  IDL  compiler  takes  the  IDL  specifications  as 
its  input  to  generate  enhanced  CORBA  program  bindings  to  set  up  the  build-time 
network  environment  for  both  client  and  server  sides.  We  shall  describe  the  first  part 
of  implementations  in  Section  6.2.1  and  the  remaining  part  in  Section  6.2.2. 

6.2.1    NCL-to-K.3  Translation 

Generally,  the  NCL-to-K.3  translation  is  a  language  mapping  process.  Its  im- 
plementation is  based  on  a  syntactic  and  semantic  analysis  of  each  NCL  construct 
to  an  equivalent  K.3  representation.  As  listed  in  Figure  6.3,  semantic  properties  of 
NCL  classes,  including  attributes,  supertype/subtype  specifications,  additional  asso- 
ciations (e.g.,  INTERACTION),  keyword  constraints,  user-defined  rules,  and  method 
specifications  and  implementations,  are  translated  into  corresponding  K.3  constructs. 
Some  constraints  of  NCL,  such  as  the  UNIQUE  keyword  constraint,  are  supported 
by  extending  the  rule  binder  of  the  OSAM*.KBMS  meta  model.  As  shown  in  Fig- 
ure 6.4,  the  meta  model  of  the  KBMS  is  based  on  a  self-describing  0-0  modeling 
mechanism  (i.e.,  using  the  model  to  model  itself).  Extensions  to  the  meta  model  can 
be  made  by  changing  the  pre-defined  meta  specifications.  Due  to  the  extensibilities 
of  the  underlying  supporting  system,  OSAM*.KBMS,  additional  constructs  can  be 
easily  added  to  NCL  without  entailing  changes  to  the  NCL  compiler. 
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Figure  6.4.  Meta  Model  of  the  Underlying  System,  OSAM*.KBMS 
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In  addition  to  serving  as  the  underlying  system  for  the  NCL  compilation,  the 
KBMS  also  serves  as  a  storage  manager  of  the  mediated  global  schema.  While  com- 
piling a  K.3  specification  which  is  translated  from  an  NCL  schema,  the  schema  infor- 
mation is  also  populated  in  the  KBMS.  At  run-time,  the  global  schema  information 
can  be  inquired  by  issuing  meta  queries  against  the  KBMS.  For  more  details  about 
the  KBMS,  readers  may  refer  to  [SHY96,  SU95]. 

6.2.2    Generation  of  Enhanced  CORBA  Bindings 

After  an  NCL  schema  is  translated  into  a  K.3  specification,  a  K.3  compiler  is 
used  to  translate  the  K.3  specification  into  C  or  C++  code  which  contains  expanded 
method  calls  for  implementing  the  event  monitoring  and  rule  processing  functions. 
Figure  6.5  illustrates  this  compilation  process  of  generating  program  code  and  bind- 
ings in  a  CORBA  environment.  In  addition  to  the  C/C++  code  generation,  IDL 
specifications  for  the  attributes  and  the  NCL  methods  are  also  generated.  The  rea- 
son for  generating  the  IDL  specifications  is  that  these  methods  which  provide  services 
for  clients  may  be  physically  distributed  in  different  servers.  The  IDL  specifications 
provide  the  means  for  achieving  interoperability  through  the  ORB. 

The  final  step  is  to  generate  the  C  or  C++  programming  language  bindings 
for  the  NCL  methods  and  insert  the  generated  C  or  C++  code  into  the  skeletons. 
The  bindings  are  generated  by  the  IDL  compiler  when  the  IDL  specifications  are 
compiled.  Each  server  site  has  the  bindings  and  the  implementations  of  its  services 
in  its  native  programming  language.  A  client  wanting  to  use  a  service  will  access 
that  service  by  using  the  corresponding  binding  in  the  programming  language  of  the 
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Figure  6.5.  NCL  Schema  to  C/C++  Code 

client.  The  interoperability  is  provided  by  the  ORB.  Note  that  the  bindings  reflect 
the  interface  of  the  original  NCL  method;  however,  the  implementation  of  the  method 
consists  of  the  C  or  C++  code  to  implement  the  function  of  that  method,  plus  the 
C  or  C++  code  to  enforce  the  before-,  after-,  and  immediately-after  rules  associated 
with  that  method.  At  run-time,  activation  of  a  method  will  automatically  trigger  the 
execution  of  the  methods  which  implement  the  CAA  parts  of  rules,  which  may  in  turn 
trigger  the  method  implementation  of  other  rules.  It  is  this  enhancement  (i.e.,  the 
automatic  processing  of  distributed  event  monitoring  and  rule  processing  functions) 
which  makes  the  ORB  "active"  and  provides  the  rule-based  interoperability 
within  the  CORBA  environment  (see  [SU96]  for  more  details). 

Figure  6.6  illustrates  the  compilation  of  a  method  (query .execution)  with  its 
associated    before-    and   immediate-after   rules    (rule.query  .modification  and 
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rule_data_conversion)  into  the  compiled  code  for  distributed  event  monitoring  and 
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query _execution_R2:  {/*  Rule  processing  for 
C/C++  code  for  implementation  of 

call  data_conversion(ps.result)  }  C/C++  code 

to  implement 
the  original  query _execution 
(renamed  to  query _execution_p) 

Figure  6.6.  Compilation  of  an  NCL  Method  query _execution()  and  Its  Associated 
Rules 


As  shown  earlier,  the  NCL  definition  of  the  SQP  class  of  the  component  system 
includes  the  specification  of  one  main  method  (query .execution),  and  two  mediation 
rules  (rule_query .modification  and  rule_data_conversion).  As  described  in  Chapter 
4,  NCL  rules  are  Event-Condition-Action-AlternativeAction  (ECAA)  rules  in  which 
the  triggering  event  can  be  the  execution  of  any  method.  To  illustrate  the  process  of 
generating  the  enhanced  program  binding,  we  shall  use  the  two  mediation  rules.  They 
are  the  before-  and  after-rules  associated  with  the  same  method,  query  .execution, 
shown  below: 
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RULE  SQP: :rule_query_modif ication; 
TRIGGER  BEFORE  query_execution(ps) 
CONDITION  ps.s_schema  <>  ps.t_schema 
ACTION 

modify_query(ps) ; 
END.RULE; 

RULE  SQP: :rule_data_conversion; 

TRIGGER  IMMEDIATE  AFTER  query_execution(ps) 

ACTION 

data_conversion(ps .result) ; 
END.RULE; 

Rule_query  .modification  specifies  that  before  method  query  .execution  is  exe- 
cuted, the  condition,  ps.s_schema  <>  ps.t_schema,  should  be  checked  to  see  whether 
the  source  and  target  schemas  of  the  subquery  are  different.  If  the  condition  evaluates 
to  True,  then  the  mediation  method  modify  .query  is  called  to  perform  query  mod- 
ification. Otherwise,  no  action  is  needed.  Also,  rule_data_conversion  specifies  that 
immediately  after  method  query-execution  is  executed,  the  other  mediation  method, 
data_conversion,  is  called  to  perform  data  conversion. 

During  the  compilation  of  the  SQP  class  by  the  NCL  compiler,  a  C  or  C++ 
method  is  generated  for  each  rule.  For  rule_query -modification,  the  C  or  C++  code 
in  method  query _execution_Rl  will  check  the  condition  ps.s_schema  <>  ps.t_schema 
and  call  method  modify  .query  if  the  condition  evaluates  to  True.  Similarly,  for 
rule_data_conversion,  a  C  or  C++  method  query  .execution  _R2  is  generated. 

For  each  method  in  the  class,  an  equivalent  IDL  specification  is  generated.  For 
example,  an  IDL  specification  is  generated  for  query  .execution.  Furthermore,  a  new 
implementation  of  query  .execution  (i.e.,  a  surrogate  query  .execution  in  C  or  C++ 
code)  is  generated.  The  new  implementation  consists  of  three  method  calls.  First,  a 
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call  to  method  query _execution_Rl  is  made  to  process  rule.query .modification  (i.e.,  a 
before-rule  for  query  .execution).  Then,  a  call  is  made  to  the  original  implementation 
of  the  method  query  .execution,  which  is  renamed  as  query  _execution_p,  to  perform 
the  query  dispatching.  Finally,  a  call  is  made  to  method  query _execution_R2  to 
process  rule_data_conversion  (i.e.,  an  immediate-after  rule  for  query  .execution). 

In  the  final  step  of  the  compilation  process,  the  IDL  compiler  is  used  to  gener- 
ate the  C  or  C++  bindings  for  all  the  methods  which  have  been  specified  in  IDL, 
including  method  query  .execution.  After  the  bindings  have  been  generated,  the 
corresponding  C  or  C++  implementation  code  for  the  surrogate  query  .execution, 
query  _execution_Rl,  query  .execution.p  (the  original  query  .execution)  and  query.exe- 
cution_R2  can  be  inserted  into  the  skeleton  of  query  .execution. 


CHAPTER  7 
EXPERIMENTATION  RESULTS 


This  chapter  presents  the  results  of  a  run-time  mediation  execution  using  the 
implemented  mediation  system.  Section  7.1  demonstrates  how  the  compiled  media- 
tion rules  are  executed  at  run-time  to  perform  the  mediation  process.  Then,  before 
showing  an  actual  example,  a  general  execution  model  of  distributed  query  process- 
ing and  mediation  is  explained  in  Section  7.2.  Section  7.3  presents  an  example  of  a 
mediated  distributed  query  processing  by  showing  its  data  and  execution  flows. 

7.1    Run-time  Mediation  Rule  Execution 

The  enhanced  binding  example  in  the  previous  chapter  shows  the  build-time 
compilation  of  the  mediation  rules  and  the  SQP's  method  associated  with  them.  This 
section  describes  the  actual  run-time  execution  of  this  example.  Figure  7.1  illustrates 
the  execution  flow  of  how  a  service  request  for  query -execution  in  a  component  system 
is  monitored  to  trigger  the  processing  of  the  mediation  rules.  First  of  all,  a  client 
makes  a  query  request  by  using  the  programming  language  binding  (i.e.,  an  IDL  stub) 
generated  by  the  IDL  compiler  for  the  method  globaLquery .execution  of  the  DQP 
When  the  request  is  made,  the  ORB  would  dispatch  that  request  to  the  DQP  server 
to  invoke  that  method. 
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— >■   Direct  method  call 
->■  Triggered  method  call 

CLIENT 


Call 

global_query_execulion( 


DQP 


global_query_execution  { 


'  Call  dispalch_subqueryO 


dispatch_subquery  { 

Call  DB_I::query_execution() 


ORB 


CD 


query_execution  { 

Call  query_execution_Rl().- 
Call  query_execution_p(); 
Call  query_execution_R20;  " 

U  


Q) 

 ^     query_cxccution_Rl  { 


/*  code  to  implement  CAA  part  of 
the  before  rule  */ 

If  (ps.s_schemao  ps.t_schema) 

/*  condition  part  */   

Then  Call  modify.queryOl  /*  action  part 


0 


query_execution_p  { 
original  query_execution()  code } 


query_execution_R2  { 

I*  code  to  implement  CAA  part  of 

the  after-rule  */(  . 

Call  data_conversion();  /*  action  part  */  } 


modify_query  {/*  This  is  the  code 
fo  r  the  method 
modify_query  of 

the  Mediator  */} 


data_conversion{/*  This  is  the  code 
fo  r  the  method 
data_conversion  of 
the  Mediator  */} 


MEDIATOR  ofDB_l 


SUBQUERY  PROCESSOR  (SQP)  of  DB_1 

Figure  7.1.  Run-time  Execution  Flow  of  Mediation  Rules 
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During  the  execution  of  the  method  global.query .execution,  a  local  call  to  dis- 
patch_subquery  is  made  to  dispatch  subqueries  to  component  systems  for  local  query 
processing.  In  Figure  7.1,  a  subquery  is  dispatched  to  the  DB_1  by  calling  the  method, 
query  .execution  (Step  (1));  however,  in  this  case  the  original  code  for  query  .execution 
is  not  invoked  directly.  Instead,  the  generated  implementation  (i.e.,  the  surrogate 
query_execution  generated  by  the  NCL  compiler)  is  executed.  First,  the  method 
query _execution_Rl  is  invoked  (Step  (2)).  The  execution  of  query _execution_Rl  in- 
volves the  checking  of  the  condition  for  comparing  the  source  and  target  schemas.  If 
the  condition  evaluates  to  True,  the  action  part  of  the  rule  is  executed  and  a  call  to 
modify _query ()  is  made  locally  (Step  (3)). 

After  query _execution_Rl  is  executed  to  process  the  before-rule  rule_query -modi- 
fication, a  call  to  the  original  query  .execution  (i.e.,  query  _execution_p)  is  made  to 
execute  the  code  which  implements  the  actual  local  query  processing  requested  by 
the  client  (Step  (4)).  After  the  original  query  .execution  has  been  executed,  a  call  to 
query  .execution  _R2  is  made  to  process  the  immediate-after-rule  for  performing  data 
conversion  (Step  (5)). 

7.2    Distributed  Query  Processing  and  Mediation 

This  section  describes  a  general  model  of  the  global  query  execution  in  the 
system  from  the  data  flow  perspective.  The  execution  of  a  global  query  includes  the 
following  steps: 

1.  Query  decomposition:  Generally,  a  global  query  may  be  complex  and  involve 
more  than  one  ENTITY  class  stored  in  different  component  systems.  It  should 
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be  decomposed  before  execution.  As  shown  in  Figure  7.2,  a  global  query 
(QueryA)  involving  two  classes  is  decomposed  into  two  simple  subqueries  (i.e., 
each  containing  a  single  class),  SubqueryAl  and  SubqueryA2. 

Global  QueryA 


SubqueryAl 


SubqueryA2 


Query  Modification 


Subqu 
toSys 

(Noel 

eryA  1  Subqu 
tern  A                   to  Sys 

lange)  Modif 
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t  \ 
ied  to                         (No  c 
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eryA2  SubqueryA2 
tent  A                  to  System  C 
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hange)                 Modified  to 
SubqueryC2 

1  1 

Processing 
SubqueryAl 


Processing 
SubqueryB  1 


Processing 
SubqueryA2 


Processing 
SubqueryC2 


Data  Conversion 

Returne 
from  Su 

(No  con 

IDataAl  Return 
jqueryAl              from  S 

'  ' 
version)  Convert 
\                 System  A's 
\  Dat, 
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id  data  into 
representation 
\2' 

Data  assembly  1 
(UNION) 


Data  assembly  2 
(UNION) 


Data  assembly  3 
(JOIN) 


Figure  7.2.  Mediated  Distributed  Query  Processing 


2.  Subquery  propagation:  For  each  decomposed  subquery,  the  KBMS  is  inquired 
to  look  up  the  meta  information  to  see  whether  other  information  sources  con- 
tain relevant  data  to  answer  the  subquery.  This  meta  information  is  derived 
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from  mediation  specifications.  In  this  example,  System  B  also  contains  data  to 
answer  SubqueryAl  while  System  C  also  contains  data  to  answer  SubqueryA2. 
Therefore,  two  replicated  copies  of  SubqueryAl  and  SubqueryA2  are  generated 
for  System  B  and  System  C,  respectively. 

3.  Subquery  modification:  SubqueryAl  dispatched  to  System  B  may  not  be  pro- 
cessible  since  the  naming  and  structural  differences  may  exist  between  Systems 
A  and  B.  SubqueryAl  is  then  modified  into  SubqueryBl  which  is  suitable  for 
processing  in  System  B.  Similarly,  SubqueryA2  is  modified  into  SubqueryC2  for 
processing  in  System  C.  Two  other  subqueries  (SubqueryAl  and  SubqueryA2), 
dispatched  to  System  A,  are  not  changed. 

4.  Subquery  execution:  All  four  subqueries  are  executed  locally  in  Systems  A,  B 
and  C.  Data  results  of  the  four  subqueries  are  generated,  shown  as  DataAl, 
DataBl,  DataA2  and  DataC2. 

5.  Data  conversion:  The  data  retrieved  from  Systems  B  and  C  do  not  conform  to 
System  A's  representation.  Data  conversions  are  performed  to  convert  DataBl 
into  DataAl'  and  convert  DataC2  into  DataA2'.  There  is  no  need  to  convert 
DataAl  and  DataA2  since  they  are  already  in  System  A's  data  representation. 

6.  Data  assembly:  DataAl  and  DataAl'  are  assembled  by  performing  a  union 
operation  since  the  two  sets  of  data  answer  the  subquery  to  the  same  class. 
Similarly,  DataA2  and  DataA2'  of  subqueryA2  are  assembled  by  performing 
a  union  operation.    Finally,  two  sets  of  data  that  answer  SubqueryAl  and 
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SubqueryA2  are  joined  based  on  the  join  conditions  and  association  constraints 
specified  in  the  original  query. 

7.3    An  Example  of  Mediated  Global  Query  Processing 

Since  the  emphasis  of  this  research  is  on  the  information  mediation  process 
rather  than  the  distributed  query  processing,  we  shall  use  a  simple  query  example 
which  inquires  data  from  multiple  structurally  different  component  systems  as  given 
in  Section  4.3.  This  example  will  be  sufficient  to  illustrate  three  major  tasks  of  the 
mediation  process,  including  locating  information  sources,  subquery  modification, 
and  data  conversion. 

As  shown  in  Figure  7.3,  an  OQL  query  for  retrieving  the  stock  data  is  issued 
based  on  the  view  of  schema  DELI.  The  query  is  to  retrieve  all  the  stock  trade  price 
and  the  date  information  relating  to  the  IBM  company  on  or  after  the  date,  1/1/1995. 
This  query  is  passed  as  a  parameter  of  a  string  value  to  the  query  processing  method, 
global-query  _execution(). 

Upon  receiving  the  query,  the  DQP  initializes,  parses,  checks  and  decomposes 
the  query  into  simple  subqueries,  each  of  which  accesses  data  from  a  single  component 
system.  Since  the  query  given  above  is  already  simple,  it  is  not  decomposed.  The 
query  is  stored  in  a  data  structure  for  initialization.  After  the  query  is  initialized, 
rule.query .propagation  is  triggered,  as  described  earlier,  to  access  the  mediation  in- 
formation from  the  KBMS.  The  meta-information  of  the  global  schema  indicates  that 
two  other  information  sources  (i.e.,  DB_2  and  DB_3)  contain  relevant  data.  Thus, 
three  instances  are  established  in  the  entity  class  Query  to  store  the  original  query 
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Client  Program: 

q:  Query_Processor; 
query:  STRING; 
result:  SET  OF  Data; 


query:: 


"CONTEXT  s:DB_l::Stock 
WHERE  s.Date  >=  '1/1/1995'  AND  s.StkCode=TBM' 
RETRIEVE  s.Date,  s.TradePrice"; 


result:=q.global_query_execution(query); 
result.value_set.displayO; 
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Figure  7.3.  Example  of  a  Global  Query  Processing 
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and  the  schema  identifiers  of  source  and  target  systems.  Each  entry  represents  a 
replicated  subquery  to  be  dispatched  to  the  SQP  of  a  component  system  for  local 
query  processing.  However,  the  two  "replicated"  subqueries  are  not  ready  for  local 
processing  since  their  representations  are  still  based  on  the  DB_1  schema.  They  need 
to  be  modified. 

Rule_query_modification  associated  with  the  SQP  of  each  component  system  is 
triggered  before  the  three  subqueries  can  be  executed.  By  comparing  the  schema 
identifiers  stored  in  the  class  Query,  it  detects  that  two  of  them  (i.e.,  replicated  for 
DB_2  and  DB_3)  need  to  be  modified.  After  the  execution  of  the  query  .modification 
method,  the  two  subqueries  are  modified  to  be  SQ2  and  SQ3,  as  shown  in  Figure  7.3, 
which  are  represented  in  the  views  of  schemas  DB_2  and  DB_3,  respectively.  Then  all 
three  subqueries  are  processed  in  their  respective  component  systems.  The  returned 
data  from  DB.l,  DB_2  and  DB_3  are  shown  as  Dl,  D2  and  D3,  respectively.  As 
assumed  previously,  the  representation  of  stock  data  in  DB_1  is  different  from  those 
in  DB_2  and  DB_3.  Since  the  query  is  issued  based  on  the  view  of  schema  DB_1, 
the  returned  data  should  conform  to  its  representation.  It  requires  some  mediation 
operations  to  be  performed  to  convert  D2  and  D3  into  the  New  York  Stock  Exchange 
representation. 

To  achieve  this,  rule_data_conversion  is  triggered  immediately  after  the  stock 
data  are  retrieved  from  each  component  system.  The  method  convertQ  is  invoked  to 
convert  the  data  returned  from  DB_2  and  DB_3  into  the  DB.l  data  representation 
(i.e.,  the  New  York  Stock  Exchange  representation).  After  the  data  conversion,  the 
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returned  data  are  uniformly  represented,  as  shown  in  the  dotted  rectangle  labelled 
Result  Conversion  in  Figure  7.3.  The  three  sets  of  data  are  then  assembled  and 
the  final  result  is  returned  to  the  user/application  program  which  issued  the  query. 
A  complete  execution  result  of  this  query  can  be  found  in  Appendix  C. 


CHAPTER  8 
CONCLUSION  AND  FUTURE  WORK 

8.1  Conclusion 

In  this  dissertation,  we  presented  a  mediation  system  that  is  based  on  a  common 
modeling  language,  NCL,  to  resolve  the  data/information  model  differences,  and  a 
mediation  specification  language,  MSL,  to  mediate  the  data  heterogeneity  problem. 
The  two  languages  have  been  implemented  and  effectively  achieve  their  individual 
objectives.  At  build-time,  the  component  systems  are  uniformly  modeled  in  NCL 
and  their  interrelationships  are  specified  in  MSL.  The  MSL  specifications  are  then 
translated  into  the  mediation  elements  in  NCL  specifications.  The  mediated  global 
schema,  combining  the  NCL  specifications  of  component  systems  and  mediation  el- 
ements, is  compiled  to  store  the  meta  information  in  the  KBMS  and  also  generate 
enhanced  CORBA  program  bindings  and  mediation  code.  The  program  bindings  link 
different  component  systems  (e.g.,  DQP,  KBMS,  clients,  and  SQPs  and  Mediators  of 
information  sources)  together  in  a  CORBA-based  client/server  distributed  network 
environment. 

At  run-time,  the  client  initiates  a  global  query  request  by  making  a  method  call 
to  DQP  with  an  OQL  query  as  the  parameter.  The  DQP  then  checks  and  decomposes 
the  global  query  into  simple  subqueries.  Before  subquery  dispatching,  a  mediation 
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rule  is  triggered  to  inquire  the  meta  information  from  the  KBMS  and  to  locate  infor- 
mation sources  that  contain  relevant  data.  Based  on  the  located  information  sources, 
more  subqueries  are  propagated  and  then  dispatched  to  the  target  systems  for  local 
processing.  Before  local  query  execution,  a  mediation  rule  is  triggered  to  modify 
the  query  if  a  different  view  of  data  exists.  After  the  local  query  execution,  another 
mediation  rule  is  triggered  to  convert  data  into  the  uniform  representation  specified 
in  the  original  query.  The  DQP  then  collects  and  assembles  the  returned  data,  and 
returns  the  final  result  to  the  client. 

Compared  to  other  mediation  research  works,  this  research  has  the  following 
distinct  features. 

1.  Resolving  both  the  data/information  model  difference  and  the  data  heterogeneity 
problem  in  a  unified  framework:  This  work  combines  and  integrates  the  research 
efforts  of  two  languages  in  a  unified  system.  It  provides  a  solution  to  the 
problem  of  both  model  and  data  heterogeneities  found  in  the  heterogeneous 
information  system. 

2.  Compilation  approach  to  generate  efficient  code:  Unlike  other  mediation  works 
using  the  interpreted  logic  rule  inferencing  approach,  a  compilation  approach 
is  taken  to  compile  the  mediation  specification  into  program  bindings  and  ex- 
ecutables.  It  improves  the  query  processing  performance  by  distributing  local 
query  processing,  rule  code  and  mediation  code  to  different  component  systems 
which  perform  query,  rule  and  mediation  processing. 
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3.  Schema  mediation  to  allow  users  to  see  data  in  their  own  views:  Instead  of 
performing  schema  integration,  the  mediation  on  component  schemas  preserves 
the  views  of  local  users.  The  users  are  able  to  issue  global  queries  based  on 
the  views  that  they  are  already  familiar  with  to  retrieve  data  stored  in  other 
information  sources. 

4.  Supporting  mediation  system  extensibility  and  scalability:  The  mediation  speci- 
fication is  based  upon  an  0-0  hierarchy  which  provides  better  extensibility  and 
scalability.  In  this  hierarchy,  mediation  specifications  defined  in  lower  levels  can 
be  reused  to  generate  upper-level  mediation  specifications.  It  saves  the  efforts 
of  creating  and  maintaining  the  mediation  information.  In  addition,  the  cost  of 
changing  the  mediation  specification  can  be  minimized  since  only  the  mediation 
specifications  in  an  affected  path  of  the  hierarchy  need  to  be  updated.  All  the 
others  remain  the  same. 

5.  Distributed  mediators  to  support  localized  mediation:  Instead  of  using  a  cen- 
tralized mediator,  the  functions  of  mediation  are  distributed  to  the  information 
sources.  The  mediation  operations  (i.e.,  query  modification  and  data  conver- 
sion) are  carried  out  locally  to  avoid  sending  queries  and  data  back  and  forth 
between  the  DQP  and  a  centralized  mediator. 

Figure  8.1  shows  the  differences  between  our  schema  mediation  approach  and 
that  of  the  TSIMMIS  project  [CHAW94]  in  the  following  five  aspects:  the  means 
of  forming  mediation  specification,  user's  views  in  issuing  queries,  computation  ap- 
proach, run-time  query  processing,  and  executions  of  mediation  operations. 
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Figure  8.1.  Differences  between  Our  Mediation  Approach  and  TSIMMIS' 

8.2    Future  Work 


In  this  work,  we  have  implemented  a  prototype  mediation  system  which  is  tightly 
coupled  with  distributed  query  processing.  The  performance  of  the  mediation  system 
depends  very  much  on  the  efficiency  of  distributed  query  processing  which  is  not 
optimized  at  the  present  time.  The  mediation  system  can  be  extended  in  the  following 
areas: 

1.  Query  optimization  can  make  use  of  the  following  information  and  approaches: 

•  Local  QP's  capabilities:  In  our  implementation  of  DQP,  global  queries  are 
decomposed  into  simple  subqueries  (i.e.,  each  involves  only  a  single  class) 
which  are  directly  sent  to  the  component  systems  for  local  processing. 
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However,  in  a  heterogeneous  information  system,  some  component  systems 
may  have  more  query  processing  capabilities,  such  as  performing  multi- 
way  JOINs.  In  that  case,  more  complex  subqueries  can  be  formed  for  these 
component  systems  so  that  they  can  be  processed  concurrently  instead  of 
performing  the  joins  in  DQP. 

MEDIATION-TYPE  constraint:  The  MEDIATION-TYPE  clause  in  the 
mediation  specification  captures  the  interrelationships  among  a  set  of 
related  classes.  The  relation  operators,  ONEOF,  ANDOR  and  AND, 
indicate  that  the  instances  of  these  subclasses  have  Set-Exclusion,  Set- 
Intersection  and  Set-Equality  relationships,  respectively.  This  meta  infor- 
mation stored  in  the  KBMS  can  be  used  to  optimize  the  subquery  gen- 
eration. For  example,  if  there  exists  a  set-equality  relationship  between 
two  subclasses  of  different  component  systems,  the  query  can  be  sent  to 
either  one  of  them  but  not  both  since  both  classes  contain  the  same  set  of 
instances. 

Network  cost  information:  In  a  distributed  information  system,  data  trans- 
missions and  communications  in  the  network  are  important  performance 
factors.  The  time  it  takes  to  transmit  data  between  each  pair  of  com- 
ponent systems  may  differ  from  another  pair.  Also  multiple  component 
systems  may  contain  the  same  data.  If  the  DQP  can  select  the  component 
systems  that  can  process  subqueries  and  transmit  data  at  a  higher  speed 
than  others,  the  processing  time  of  a  global  query  can  be  greatly  reduced. 


82 


Network  transmission  cost  information  can  be  stored  and  managed  by  the 
meta  information  manager  (i.e.,  the  KBMS)  and  be  used  by  the  DQP  at 
run-time  to  select  the  proper  component  systems  for  query  processing. 

•  Parallel  execution  of  subqueries:  The  current  implementation  of  DQP 
performs  sequential  subquery  executions.  To  improve  query  processing 
performance,  parallel  processing  of  subqueries  is  necessary.  This  can  be 
achieved  by  forking  multiple  concurrent  processes  or  using  the  multithread 
facility  provided  by  the  CORBA  communication  infrastructure. 

2.  Distributed  transaction  management:  Distributed  query  processing  and  me- 
diation needs  to  be  carried  out  in  the  framework  of  distributed  transaction 
management.  Issues  related  to  concurrent  control  and  error  recovery  also  need 
to  be  dealt  with  in  our  future  work. 

3.  Global  semantic  associations:  In  a  heterogeneous  information  system,  semantic 
associations  among  data  entities  across  multiple  component  systems  may  exist. 
This  information  is  not  available  in  any  of  the  component  systems.  They  can 
be  stored  and  managed  by  the  KBMS. 

4.  Management  of  meta  information:  To  support  the  use  of  local  QP's  capabilities 
and  network  cost  information  to  do  query  optimization,  the  meta  schema  of 
the  KBMS  needs  to  be  extended.  Additional  attributes  for  carrying  the  extra 
information  need  to  be  introduced  and  associated  with  the  meta  class,  Schema. 
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With  this  extension,  the  KBMS  will  be  able  to  provide  the  DQP  with  the  useful 
information  to  improve  its  query  processing  performance  at  run-time. 


APPENDIX  A 
BNF  OF  THE  NIIIP  COMMON  LANGUAGE  (NCL) 


/*  Starting  Symbol:  ncl_file  */ 
ncl.file  :  defs 


defs  :    /*  NULL  */ 

I    defs  def 


def  :  class_def 

I  method_body_spec 


I  global_rule_spec 
I  program_def 


class_def  :    T_DEFINE  class.type  class.id  opt_equal_spec 

opt_in_schema_spec 
opt_class_specs 
T_END_DEFINE  TSEMI 


class.type  :  T_CLASS_TYPE 


class_id  :     T  ID 


opt_equal_spec  :    /*  NULL  */ 

I    T_EQUAL  underlying_type 
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opt _in_ schema. spec  :    /*  NULL  */ 

I    T_IN  T_ID 


opt_class_specs  :  /*  NULL  */ 

I  class_specs 


class_specs  :  subsuper  T_SEMI 

explicit_attr_list 
derivecLclause 
inverse_clause 
unique_clause 
where_clause 
assoc_section 
method.section 
local_rule_section 
code_section 
I  T_SEMI  constant.decl 


/ *    global  rule  specification 

global_rule_spec  :  rule.decls 


/*  

subsuper 


SUPERTYPE/SUBTYPE  delcaration   

:  /*  NULL  */ 

I  supertype_decl  subtype_decl 
I  subtype_decl  supertype_decl 
I  subtype_decl 
I  supertype_decl 


supertype_decl 


:  T.ABSTRACT  T.SUPERTYPE 
I  T.ABSTRACT  T.SUPERTYPE  T_0F 
T_LEFT_PAREN 

supertype.expr  T_RIGHT_PAREN 
I  T.SUPERTYPE  T_0F  T_LEFT_PAREN 
supertype_expr 
T_RIGHT_PAREN 
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supertype_expr 


:  supertype_f actor 

I  supertype_expr  T_AND  supertype_f actor 
I  supertype_expr  T.ANDOR  supertype_f actor 


supertype_f actor 


:  T_ID 
I  oneof 

I  T_LEFT_PAREN  supertype.expr  T_RIGHT_PAREN 


oneof 


:  T.ONEOF 

T_LEFT_PAREN  supertype.list  T_RIGHT_PAREN 


supertype_list 


:  supertype.expr 

I  supertype_list  T_C0MMA  supertype.expr 


subtype_decl 


:  T.SUBTYPE  T_0F  T_LEFT_PAREN 
identif ier.list  T_RIGHT_PAREN 


identif ier_list 


T_ID 

identifier.list  T.COMMA  T_ID 


/*    explicit  attribute  delcarations 

explicit_attr_list  :  /*  NULL  */ 


I  explicit_attr_list  explicit_attr 


*/ 


explicit.attr 


attribute_list  T.COLON  opt 
base.type  opt_where_cons  T_SEMI 


opt_where_cons 


/*  NULL  */ 

T.WHERE  T_LEFT_PAREN  add.cons  T_RIGHT_PAREN 


add_cons 


:  T_C0NS_ID  opt.para 

I  add.cons  T_C0MMA  T_C0NS_ID  opt.para 


opt_para 


/*  NULL  */ 
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I  T_LEFT_PAREN  bound_colon_list  T_RIGHT_PAREN 
I  expr 


attribute_list 


:  attribute_decl 

I  attribute.list  T.COMMA  attribute.decl 


attribute_decl 


:  T_ID 

I  qualif ied_attribute 


qualif ied_attribute 


T_SELF 

group.qualif ier  attribute_qualif ier 


attribute_qualif ier 
group_qualif ier 


:  T  DOT  T  ID 


T  BACKSLASH  T_ID 


opt 


/*  NULL  */ 
T.OPTIONAL 


bound_spec 


T_LEFT_BRACKET  bound. 1  T.COLON 
bound_2  TRIGHT  BRACKET 


bound. 1 


:  expr 


bound_2 


expr 


expr 


:  expr_list_expr 
I  binary.operation 
I  unary_operation 
I  list_expr 

I  T.CONTEXT  context_expr 
I  existential_quant 
I  universal_quant 


parenthesis_expr 

bracket_expr 

dotted_expr 

single.item 

interval.expr 

query_expr 


/*  exp  {  exp,exp,...}  */ 

expr_list_expr  :  expr  list.expr 


list_expr 
expr_list 


T_LEFT_CURL  expr.list  T_RIGHT_CURL 


:  /*  NULL  */ 
I  expr_ls 


expr_ls 


:  expr 

I  expr.ls  T.COMMA  expr 


/*  exp  <op>  exp  */ 
binary_operation 


:  expr  T.ASSIGNMENT  expr 

I  binaryl.operation 

I  binary2_operation 

I  binary3_operation 


binary l_operation 


expr  T_LESS_THAN  expr 

expr  T_LESS_EQUAL  expr 

expr  T_GREATER_THAN  expr 

expr  T_GREATER_EQUAL  expr 

expr  T_N0T  expr 

expr  T.EQUAL  expr 

expr  T_NOT_EQUAL  expr 

expr  T_IN  expr 

expr  T_LIKE  expr 

expr  T_INST_NOT_EQUAL  expr 

expr  T_INST_EQUAL  expr 


binary2_operation 


expr  T_0R  expr 
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I  expr  T_PLUS  expr 
I  expr  T.MINUS  expr 
I  expr  T_XOR  expr 


binary3_operation 


expr 

T. 

.TIMES  expr 

expr 

T. 

.DIV  expr 

expr 

T. 

.MOD  expr 

expr 

T. 

.AND  expr 

expr 

T. 

.DOLLAR  expr 

expr 

T. 

_REAL_DIV  expr 

expr 

T. 

.CONCAT.OP  expr 

expr 

T. 

.EXP  expr 

/*  <op>  expr  */ 
unary _ operation 


context_expr 


pattern_spec 


context_expr_list 


T_LESS_THAN  expr 
T_LESS_EQUAL  expr 
T_ GRE ATER_TH AN  expr 
T_GREATER_EQUAL  expr 
T_N0T  expr 
T_EQUAL  expr 
T_NOT_EQUAL  expr 
T_0R  expr 
T_PLUS  expr 
T.MINUS  expr 
T.TIMES  expr 
T_DIV  expr 
T_M0D  expr 
T_AND  expr 
T.AMPERSAND  expr 


pattern_spec  opt_where_clause  opt_select_clause 


context_expr_list 
binary_context 

T_LEFT_PAREN  pattern.spec  T_RIGHT_PAREN 
single_context 


T_LEFT_CURL  pattern.list  T_RIGHT_CURL 
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pattern_list 


pattern_list_spec 

pattern_list  T_C0MMA  pattern_list_spec 


pattern_list_spec 


:  binary. context 
I  unary.context 
I  single_context 


binary. context 


:  binaryl_context 
I  binary2_context 
I  binary3_context 


binaryl_context 


:  pattern.spec  T_AND  context_expr_list 
I  pattern_spec  T_0R  context_expr_list 


binary2_context 


:  pattern.spec  T_BANG  opt.link  pattern_spec 

I  pattern_spec  T_NASS0C_L  opt.link  pattern_spec 

I  pattern_spec  T_NASS0C_R  opt.link  pattern_spec 


opt_link 


:  /*EMPTY*/ 

I  T_LEFT_BRACKET  T_ID  T_RIGHT_BRACKET 


binary3_context 


:    pattern_spec  T_TIMES  opt_link  pattern_spec 
I    pattern_spec  T_ASS0C_L  opt_link  pattern_spec 
I    pattern_spec  T_ASS0C_R  opt.link  pattern.spec 


unary .context 


T.TIMES  opt.link  pattern.spec 
T_ASS0C_L  opt_link  pattern_spec 
T_ASS0C_R  opt.link  pattern.spec 
T_BANG  opt.link  pattern_spec 
T_NASS0C_L  opt.link  pattern_spec 
T_NASS0C_R  opt_link  pattern_spec 


single_context 


:  T_ID 

I     T_ID  T.COLON  T_ID 


I    T_ID  T_TWO_COLON  T_ID 

I    T_ID  T.COLON  T_ID  T_TWO_COLON  T_ID 


opt_where_clause 


:  /* EMPTY*/ 

I    T.WHERE  expr 


opt_select_clause 


:  /*EMPTY*/ 

I    T.SELECT  identif ier.list 


variable_list 


identif ier_list 


/*  EXIST  ID, ID,...  IN  <context_exp>  */ 

existential_quant  :  T.EXIST  variable.list  T_IN  context_expr 


/*  FORALL  ID, ID, 
universal_quant 


IN  <context_exp>  SUCHTHAT  <exp>  */ 

:  T.FORALL  variable.list  T_IN  context_expr 
T.SUCHTHAT  expr 


/*  (  <exp>  )  */ 
parenthesis_expr 


T_LEFT_PAREN 

expr  T_RIGHT_PAREN 


/*  [  <exp>  ]  */ 
bracket_expr 


:  expr  brackets 


brackets 


T.LEFT.BRACKET 

expr  T_RIGHT_BRACKET 


/*  <exp>  .  ID  or  <exp> .method_call  */ 
dotted.expr  :  expr  T_D0T  T_ID 

I  expr  T_D0T  method_invocation 


/*  {  <exp>  <=  <exp>  <=  <exp>  }  */ 

interval.expr  :    T_LEFT_CURL  expr  interval_op 


expr  interval.op 
expr  T_RIGHT_CURL 


interval_op 


:  T.LESS.EQUAL 
I  T_LESS_THAN 


/*  QUERY  (  ID  <*  <exp>  I  <exp>  )  */ 

query.expr  :    T.QUERY  T_LEFT_PAREN  qualif ied.name 

T_Q_STAR  expr  T_BAR 
expr  T_RIGHT_PAREN 


/*  single  item  (to  terminals)  */ 


single_item 


method_invocation 
qualif ied_name 
literal 

qualif iable.f actor 


qualif ier_list 


literal 


T_BINARY_LITERAL 

T_INTEGER_LITERAL 

T_LOGICAL_LITERAL 

T_REAL_LITERAL 

T_STRING_LITERAL 

T_ENCODED_STRING_LITERAL 


qualif iable_f actor 


:  constant.f actor 
I  T_ID  qualifier 


constant  factor 


built_in_constant 


built_in_constant 


:  T_C0NST_E 

I  T_PI 

I  T.SELF 

I  T.QUESTIONMARK 

I  T.NULL 


/*  appended  */ 


method_invocation 


T_BUILTIN_METHOD 
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T_LEFT_PAREN 

expr.list  T_RIGHT_PAREN 

T_ID 

T_LEFT_PAREN 

expr_list  T_RIGHT_PAREN 


qualif ier_list 


:    /*  NULL  */ 

I    qualif ier_list  qualifier 


qualifier 


:    attribute_qualif ier 

I    group_qualif ier 

I    subcomponent_qualif ier 


subcomponent_qualif ier     :    T_LEFT_BRACKET  expr 

T_RIGHT_BRACKET 
I    T_LEFT_BRACKET    expr  T.COLON 
expr  T_RIGHT_BRACKET 


base_type 


:  aggregation_types 
I  simple_types 
I    qualif ied_name 


aggregation_types 


:  array_type 

I  bag.type 

I  set.type 

I  list.type 


array_type 


T.ARRAY 

bound_spec  T_0F  opt  unique  base_type 


unique 


/*  NULL  */ 
I  T.UNIQUE 


bag_type 


T_BAG  gen_bound_spec  T_0F  base_type 


gen_bound_spec  :  bound_spec 

I    /*  NULL  */ 


set_type  :  T_SET 

gen_bound_spec    T_OF  base_type 


list.type  :  T_LIST 

gen_bound_spec  T_OF  unique  base_type 


simple_types  :  T.BINARY  precision_spec  opt_fixed 

I  T.BOOLEAN 

I  T_ INTEGER  precision.spec 

I  T.LOGICAL 

I  T.NUMBER 

I  T_REAL  precision_spec 

I  T_STRING  precision.spec  opt_fixed 

I  T.VOID 


precision_spec  :    /*  NULL  */ 

I     T_LEFT_PAREN  expr  T_RIGHT_PAREN 


opt.fixed  :    /*  NULL  */ 

I  T.FIXED 


/*  DERIVE  clause   */ 

derived.clause  T.DERIVE  derived_attr_list 

I     /*  NULL  */ 


derived_attr_list  :  derived_attr 

I    derived_attr_list  derived.attr 


derived_attr 


attribute_decl  T.COLON  base.type 
T_ASSIGNMENT  expr  T.SEMI 


/*  INVERSE  clause   */ 

inverse.attr  :  T_ID 

T.COLON  inverse.type 
T_F0R  T_ID  T.SEMI 


inverse_attr_list 


:  inverse_attr 

I    inverse_attr_list  inverse.attr 


inverseclause 


T_ INVERSE  inverse_attr_list 
I     /*  NULL  */ 


inverse_type 


set_or_bag  T_ID 


set_or_bag 


:  T_SET  gen_bound_spec  T_OF 
I  T_BAG  gen_bound_spec  T_0F 
I     /*  NULL  */ 


/*  

unique_clause 


UNIQUE  clause    */ 

T_UNIQUE  unique_rule_list 
I     /*  NULL  */ 


unique_rule_list 


:  unique_rule 

I    unique_rule_list  unique_rule 


unique.rule 


:    attribute.list  T_SEMI 
I  T_ID 

T.COLON  attribute_list 

T.SEMI 


/*  

where.clause 


WHERE  (domain  rule)  clause  ■ 
T.WHERE 

domain_rule_list 
I     /*  NULL  */ 
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domain_rule_list 


domain.rule 

domain_rule_list  domain.rule 


domain_rule 


:    T_ID  T.COLON  expr 

T.SEMI 
I    expr  T.SEMI 


/*  

assoc_section 


ASSOCIATION  section    */ 

:  T.ASSOCIATIONS  T.COLON  assoc.specs 
I     /*  NULL  */ 


assoc.specs 


:  assoc.spec 

I    assoc.specs  assoc.spec 


assoc.spec 


assoc.type  opt_to_of 
attr_or_param_list 
opt_card_output  T_SEMI 


assoc_type 


T_ASSOC_TYPE 


opt_to_of 


:  /*  NULL  */ 
I  T_T0 
I  T_0F 


attr_or_param_list 


T_LEFT_PAREN  attr_or_param_rep  T_RIGHT_PAREN 


attr_or_param_rep 


:  T_ID 

I    f ormal_param_rep 


opt_card_output 


:  /*  NULL  */ 
I  output.clause 
I  card_clause 
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card.clause 


T_CARDINALITY  T_LEFT_PAREN  sub_card_list 
T.RIGHT.PAREN 


sub_card_list 


sub_card 

sub_card_list  T.SEMI  sub.card 


sub_card 


:    attr_colon_list  T_EQUAL  bound_colon_list  T_SEMI 


attr_colon_list 


T_ID 

T.COLON  T_ID 


bound_colon_list 


:  bound_spec 

T.COLON  bound.spec 


output_clause 


:  T_OUTPUT_DATA  T_LEFT_PAREN  T  ID  T.RIGHT.PAREN 


/*  

method.section 


METHOD  declaration  section 


*/ 


:    T.METHODS  T.COLON 

exception.list  method.list 
I     /*  NULL  */ 


exception.list 


:    /*  NULL  */ 

I    exception.list  exception.decl 


exception.decl 


:     T.EXCEPTION  T.ID 

formal.param.list  T.SEMI 


f ormal.param.list 


:    /*  NULL  */ 

I     T.LEFT.PAREN  formal .par am.rep  T.RIGHT.PAREN 


f ormal.param.rep 


formal.param 
I    f ormal.param.rep  T.SEMI  formal.param 
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f ormal_param 


method_list 


method_decl 


oneway 


method_params 


method_parameter_rep 


method_parameter 


passing_type 


para_type 


generalized.types 


attribute_list  T_C0L0N  para.type 


:  method_decl 

I    method_list  method_decl 

> 

:    T.METHOD    oneway  T_ID 
method_params 
T.COLON  return.type 
raise 
T_SEMI 

:     /*  NULL  */ 
I  T_0NEWAY 


:  T_LEFT_PAREN  T_RIGHT_PAREN 

I  T_LEFT_PAREN  method_parameter_rep  T_RIGHT_PAREN 
> 

:  method_parameter 

I  method_parameter_rep  T_SEMI  method.parameter 

:  passing.type    attribute.list  T_COLON  para_type 


:  T_IN 
I  T.OUT 
I  T.INOUT 


:  generalized_types 

I  simple_types 

I  qualif ied.name 

:  aggregate .type 

I  general_aggregation_types 
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I  generic_type 


aggregate.type 


:    T.AGGREGATE  T_OF  para.type 

I    T.AGGREGATE  T.COLON  T_ID  T_OF  para.type 


general_aggregation_types :  general_array_type 

I  general_bag_type 

I  general_set_type 

I  general_list_type 


general_array_type 
general_bag_type 
general_list_type 
general_set_type 


T_ARRAY    gen_bound_spec  T_OF  para.type 


:    T_BAG  gen_bound_spec  T_0F  para_type 


T.LIST  gen_bound_spec  T_OF  para.type 


:    T_SET  gen_bound_spec  T_0F  para.type 


generic.type 


:  T.GENERIC 

I     T.GENERIC  T.COLON  T_ID 


return_type 


:  T.VOID 
I  para_type 


raise 


:  /*  NULL  */ 

I  T.RAISES  exception_id_list 


exception_id_list  :  T_LEFT_PAREN  identif ier.list  T_RIGHT_PAREN 


/*  local  RULE  section  */ 

local_rule_section  :  /*  NULL  */ 

I  T.RULES  T.COLON  rule.decls 
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rule_decls 


:  rule.decl 

I  rule_decls  rule_decl 


rule_decl 


rule_head  opt_rule_trigger  rule_body 
T_END_RULE  T.SEMI 


rule_head 


:  T  RULE  T_ID  T.SEMI 


opt_rule_trigger 


:  /*  NULL  */ 
I  rule_trigger 


rule_trigger 
trigger.conds 


T_TRIGGERED  trigger.conds 


trigger.cond 

trigger_conds  trigger_cond 


trigger_cond 


trigger.time  operation.list 


trigger_time 


:  T.BEFORE 

I  T.AFTER 

I  T  IMMED_ AFTER 


operation_list 


:  operation_spec 

I  operation_list  T_C0MMA  operation_spec 


operation_spec 


:  qualif ied.name  T_LEFT_PAREN  opt_param_decls 
T_RIGHT_PAREN 


qualif ied_name 


:  T_ID 

I  T_ID  T_TW0_C0L0N  T_ID 
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opt_param_decls 


:  /*  NULL  */ 
I  param_decls 


param_decls 


:  param_spec 

I  param_decls  T_COMMA  param_spec 


param_spec 


:  T_ID  T.COLON  class.spec 
I  class_spec 


class_spec 

rule_body 

opt_rule_cond 


base_type 


:  opt_rule_cond  opt_rule_action  opt_rule_alt 


:  /*  NULL  */ 

I  rule_cond_spec 


rule_cond_spec 


T.CONDITION  rule.cond 


rule_cond 


expr 

expr  T_BAR  expr 


opt_rule_action 


/*  NULL  */ 
rule.action 


rule_action 


T.ACTION  stmts 


opt_rule_alt 


:  /*  NULL  */ 
I  rule_alt 


rule.alt 


:  T.OTHERWISE  stmts 
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/*  executable  statements  */ 

stmts  :  stmt 

I  stmts  stmt 


stmt 


null_stmt 


branch. stmt  : 

i 

if_stmt  : 


opt_else  : 

I 

» 

/*  confer  with  case_stmt 
case_stmt  : 


multiple_when  : 

I 


expr  T.SEMI 
null.stmt 

branch.stmt  T.SEMI 
loop.stmt  T.SEMI 
flow_stmt  T.SEMI 
local.stmt  T_SEMI 
expr  T_D0  stmts  T_END  T.SEMI 
T.BEGIN  stmts  T.END  T.SEMI 


T.SEMI 

if _stmt 
case_stmt 

T_IF  expr 
T_THEN  stmts 
opt.else  T_END_IF 


/*  NULL  */ 
T.ELSE  stmts 


in  EXPRESS  */ 
T.CASE  multiple.when 
opt_otherwise 
T_END_CASE 

when_stmt 

multiple_when  when_stmt 


when_stmt 


:  T.WHEN  expr  T_D0  stmt 
> 
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opt.otherwise 


:  /*  NULL  */ 

I  T.OTHERWISE  T_D0  stmt 


loop_stmt 


:  for_stmt 
I  while_stmt 


f or_stmt 


:  T_F0R  expr  T_UNTIL  expr  T_BY  expr 
T_D0  stmts  T_END_F0R 


while_stmt 


T_ WHILE  expr  T_D0 
stmts  TEND.WHILE 


f low_stmt 


:  T.BREAK 
I  T_C0NTINUE 
I  T.ABORT 
I  return_stmt 


return_stmt 


:  T_RETURN  opt.expr 


opt_expr 


:  /*  NULL  */ 
I  expr 


/*  

opt_local 


local  variable  decls 
:  /*  NULL  */ 
I  local_stmt 


*/ 


local.stmt 


:  T.LOCAL 
var.decls  T_END_L0CAL 
T_SEMI 


var_decls 


:  var_decl 

I  var_decls  var_decl 


var_decl 


attribute_list 
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T_C0L0N  class.spec  T_SEMI 


/*  method  body  spec  */ 

method_body_spec  :  method.head 

opt_local 

stmts 

T_END_METH0D  T_SEMI 


method.head  :    T.METHOD  qualif ied.name  T_LEFT_PAREN 

T_RIGHT_PAREN  T.SEMI 


/*  

underlying_type 


about  data  types 


:  constructed_types 

I  aggregation_types 

I  simple_types 

I  T_ID 


*/ 


constructed_types 


:  enumeration_type 
I  select.type 


enumeration.type 


:    T.ENUMERATION  T_0F 

T_LEFT_PAREN  identif ier.list  T_RIGHT_PAREN 


select_type 


:     T.SELECT  T_0F 

T_LEFT_PAREN  identif ier.list  T_RIGHT_PAREN 


/*  

code_section 


code  section  */ 

:  /*  NULL  */ 

I  T_C0DES  T.COLON  code.list 


codelist 


:  text.block 
I  var_code_list 


textblock 


T_BEGIN_TEXT  stmts  T_END_TEXT 
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var_code_list 


:  var.code  T.SEMI 

I  var_code_list  var.code  T_SEMI 


var_code 


T_ID 

T.ASSIGNMENT  code.expression 


code_expression 


:  text.block 
I  expr 


/*  

constant_decl 


constant_decl    */ 

:    INCONSTANT  constant.list  T_END_CONSTANT 
T.SEMI 


constant_list 


:  constant_body 

I    constant.list  constant_body 


constant.body 


:    T_ID  T.C0L0N  base.type  T.ASSIGNMENT 
expr  T_SEMI 


/*  

program_def 


program_def 


*/ 


:    T_PGM  T_ID    opt.local  stmts 
T_END_PGM  T.SEMI 


APPENDIX  B 

BNF  OF  THE  MEDIATION  SPECIFICATION  LANGUAGE  (MSL) 


/*  Starting  Symbol:  mediation.! ile  */ 


mediationf ile 


:    /*  NULL  */ 

I    mediation.! ile  schema.decl 


schema.decl 


schema_header 

schema_body  TOK_END_SCHEMA  semicolon 


schema_header 


:    TOK.SCHEMA  TOK_IDENTIFIER  semicolon 


schema_body 


:    interf ace_spec_list  schema_block_list 


interf ace_spec_list 


/*  NULL  no  interface  specification  */ 
interf ace_spec_list  interf ace_specif icat ion 


interf ace_specif ication  :  use_clause 


use_clause 


:  TOK.USE  TOK.FROM  TOK.IDENTIFIER  semicolon 
I     TOK.USE  TOK.FROM  T0K_ IDENTIFIER 

TOK_LEFT_PAREN  entity_id_list 

TOK_RIGHT_PAREN  semicolon 


entity_id_list 


:  entity_id 

I    entity_id_list  comma  entity.id 
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entity_id 
schema_block_list 


:  TOK.IDENTIFIER; 

/*  NULL  */ 
I    schema_block_list  schema_block 


schema_block 


:  declaration 


declaration 


:  entity_decl 


entity_decl 


entity_head 

supertype_declaration  semicolon 
entity.body  T0K_END_ENTITY  semicolon 


entity_head 


TOK.ENTITY  TOK.IDENTIFIER 


supertype_declaration 


:     TOK.ABSTRACT  TOK.SUPERTYPE 

I     TOK.ABSTRACT  TOK.SUPERTYPE  TOK.OF 

TOK_LEFT_PAREN    supertype.expre  s  s  i  on 

TOK.RIGHT.PAREN 


supertype.expression 


:    supertype_f actor 

I     supertype_expression  T0K_AND    supertype_f actor 
I     supertype.expression  T0K_AND0R    supertype_f actor 


supertype_f actor 


:  super_sch_entity 
I  oneof 

I    T0K_LEFT_PAREN  supertype.expression  TOK_RIGHT_PAREN 


super_sch_entity 


:    TOK.IDENTIFIER  T_TW0_C0L0N  TOK.IDENTIFIER 


oneof 


T0K.0NE0F 

T0K_LEFT_PAREN  supertype.list  TOK_RIGHT_PAREN 


supertype.list 


supertype.expression 
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I    supertype_list  TOK.COMMA  supertype_expression 


entity_body 
entity_equ_clause 


entity_equ_clause  attr_equ_list 


:  TOK.ENTITY  TOK.EQUIVALENCE  TOK_LEFT_PAREN 
sch_entity_list 
TOK  RIGHT  PAREN  semicolon 


sch_entity_list 


:  sch_entity  comma  sch_entity 

I  sch_entity_list  comma  sch_entity 


sch_entity 


schema.id  T_TW0_C0L0N  entity.id 


schema. id 


TOK.IDENTIFIER 


entity_id 
attr_equ_list 


TOKIDENTIFIER 


:  attr_equ_clause 

I  attr_equ_list  attr_equ_clause 


attr_equ_clause 


TOK.ATTRIBUTE  TOK.EQUIVALENCE 

TOK_LEFT_PAREN 

sch_entity_attr_list 

TOK_RIGHT_PAREN  semicolon 

opt_value_equ_clause 

opt_where_clause 


sch_entity_attr_list 


:  s_e_a_element  comma  s_e_a_element 

I  sch_entity_attr_list  comma  s_e_a_element 


s_e_a_element 


:  sch_entity_attr 

I  TOK_LEFT_PAREN  s_e_a_list  TOK_RIGHT_PAREN 


109 


s_e_a_list 


:  sch_entity_attr 

I  s_e_a_list  comma  sch_entity_attr 


sch_entity_attr 


:  schema.id  T_TW0_C0L0N  entity.id  TOK.DOT  attr.id 
I  schema.id  T_TW0_C0L0N  entity.id 

TOK.BACKSLASH  schema.id  T_TW0_C0L0N  entity.id 

TOK.DOT  attr.id 


attr.id 


:  TOK.IDENTIFIER 


opt_value_equ_clause 


:  /*  NULL  */ 

I  value_equ_clause 


value_equ_clause 


TOK.VALUE  TOK.EQUIVALENCE 
TOK_LEFT_PAREN  value_equ_para_list 
TOK_ RIGHT  PAREN  semicolon 


value_equ_para_list 


:  value_equ_para 

I  value_equ_para_list  semicolon  value_equ_para 


value_equ_para 


:  value_equ_type2 
I  /*  NULL  */ 


value_equ_type2 


:  TOK_LEFT_PAREN  sch_entity_attr 
comma  attr_conv_method_list  TOK_RIGHT_PAREN 


attr_conv_method_list 


attr_conv_method 

attr_conv_method_list  comma  attr_conv .method 


attr_conv_method 


:  conv_method_id 
TOK_LEFT_PAREN  s_e_a_list 
TOK_RIGHT_PAREN 


conv_method_id 


TOK.IDENTIFIER 
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opt_where_clause 


:  /*  NULL  */ 
I  where_clause 


whereclause 


:  TOK.WHERE 

domain_rule_list 


domain_rule_list 


:  domain.rule 

I    domain_rule_list  domain_rule 


doraain_rule 


expression  semicolon 


expression 


simple_expression 
I    expression  rel_op_extended  simple.expression 


rel_op_extended 
rel.op 


rel_op 


T0K_LESS_THAN 

TOK_GREATER_THAN 

TOK_LESS_EQUAL 

TOK_GREATER_EQUAL 

T0K_N0T_EQUAL 

TOK.EQUAL 

T0K_INST_N0T_EQUAL 
TOK_INST_EQUAL 


simple.expression 


term 


I    simple_expression  add_like_op  term 


add_like_op 


:  TOK.PLUS 

I  TOK.MINUS 

I  TOK.OR 

I  T0K_X0R 


term 


I 


factor 

term  multiplication_like_op  factor 


multiplication_like_op 


factor 


simple.f actor 


unary_op 


primary 


literal 


identifier 


token 


:  TOK.TIMES 

I  TOK_REAL_DIV 

I  T0K_C0NCAT_0P 

I  TOK.AND 

I  TOK.DIV 

I  T0K_M0D 

> 

:  simple.f actor  TOK.EXP  simple_f actor 

I  simple_f actor 

> 

TOK.NOT  TOK_LEFT_PAREN 
expression  TOK_RIGHT_PAREN 

I  TOK_LEFT_PAREN 

expression  TOK_RIGHT_PAREN 

I  unary_op  primary 

I  primary 

> 

:  TOK.PLUS 

I  TOK.MINUS 

I  TOK.NOT 


:  literal 
I  identifier 


:  TOK_BINARY_LITERAL 

I  TOK_INTEGER_LITERAL 

I  TOK_LOGICAL_LITERAL 

I  TOK_REAL_LITERAL 

I  TOK_STRING_LITERAL 


token  qualif ier_list 


TOK.IDENTIFIER 


I     TOK.SELF  TOK.DOT  TOK.IDENTIFIER 


qualif ier_list 


qualifier 


attribute_qualif ier 


group_qualif ier 


class_qualif ier 


:  /*  NULL  */ 

I  qualif ier_list  qualifier 
> 

:  attribute_qualif ier 

I  group_qualif ier 

I  class_qualif ier 

:  TOK.DOT  TOK.IDENTIFIER 


TOK.BACKSLASH  TOK.IDENTIFIER 


:  T_TWO_COLON  TOK.IDENTIFIER 


APPENDIX  C 
RESULT  OF  A  GLOBAL  QUERY  EXECUTION 


Complete  execution  results  of  the  query: 
CONTEXT  s:DBl: : Stock 

WHERE  s.Date  >=  '1/1/1995'  AND  s.StkCode  =  'IBM' 
RETRIEVE  (s.Date,  s .TradePrice) ; 


CLIENT:  Sending  a  query  to  DQP 

DQP:  Query  received, 
context  DB_1:: Stock 

where  (DB_1: :Stock.Date>"l/l/95")and(DB_l : : Stockl . StkCode="IBM") 
retrieve  (DB_1: : Stock. Date, DB_1: : Stockl. TradePrice) ; 

DQP:  One  subquery  is  generated. 

KBMS:  Triggered  by  a  mediation  rule 

3  information  source (s),  DB_1,  DB_2  and  DB_3,  have  been  located. 

DQP:  Send  subquery  to  DB_1_SQP 

DB_1_SQP:  Query  received 
context  DB_1:: Stock 

where  (DB_1: :Stock.Date>"l/l/95")and(DB_l: : Stockl . StkCode=" IBM") 
retrieve  (DB_1: : Stock. Date, DB_1 : : Stockl .TradePrice) ; 
TARGET  SYSTEM :DB_1; 

DB_1_SQP:  No  change  necessary 

DB_1_SQP:  Processing  query  in  DB_1 
DB_1 : : Stock . Date  DB_1 : : Stockl . TradePrice 


1/1/95 
1/2/95 
1/3/95 
1/4/95 


20\6 
21\5 
22\3 
23\1 
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DB_l_Mediator :  Data  conversion  begins. 
DB_1: : Stock. Date:  No  conversion  needed 

DB_1: : Stock. TradePrice:  No  conversion  needed 

DQP:  Data  received  from  DB_1_SQP  (4  instances) 
DB_1 : : Stock . Date  DB_1 : : Stockl . TradePrice 


DQP:  Send  subquery  to  DB_2_SQP 

DB_2_SQP:  Query  received 
context  DB_1:: Stock 

where  (DB_1: :Stock.Date>"l/l/95")and(DB_l : :Stockl . StkCode="IBM") 
retrieve  (DB_1: : Stock. Date, DB_1: : Stockl. TradePrice) ; 
TARGET  SYSTEM : DB_2 ; 

DB_2_SQP:  Query  needs  to  be  modified. 

DB_2_Mediator :  Triggered  to  modify  the  query 

DB_2_SQP:  Modified  query 
context  DB_2: : Stock 
where  (DB_2: : Stock. Date>" 1/1/95") 
retrieve  (DB_2 : : Stock . Date , DB_2 : : Stock2 . IBM) ; 
TARGET  SYSTEM :DB_2; 

DB_2_SQP:  Processing  query  in  DB_2 
DB_2 : : Stock . Date  DB_2 : : Stock2 . IBM 


1/1/95 
1/2/95 
1/3/95 
1/4/95 


20\6 
21\5 
22\3 
23\1 


2/1/95 
2/2/95 
2/3/95 
2/4/95 


11.6 
11.5 
12.3 
13 


DB_2_Mediator :  Data  conversion  begins. 
DB_2: : Stock. Date  ->  DB_1 :: Stock. Date 


DB_2: : Stock. IBM  ->  DB_1 :: Stock. TradePrice 
11.6  ->  11\09 
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11.5  ->  11\08 
12.3  ->  12\04 
13  ->  13\00 

DQP:  Data  received  from  DB_2_SQP  (4  instances) 
DB_1: : Stock. Date  DB_1: :Stockl .TradePrice 


DQP:  Send  subquery  to  DB_3_SqP 

DB_3_SQP:  Query  received 
context  DB_1:: Stock 

where  (DB_1: :Stock.Date>"l/l/95")and(DB_l: :Stockl.StkCode=llIBMM) 
retrieve  (DB_1: : Stock. Date, DB_1 : :Stockl .TradePrice) ; 
TARGET  SYSTEM :DB_3; 

DB_3_SQP:  Query  needs  to  be  modified. 

DB_3_Mediator :  Triggered  to  modify  the  query 

DB_3_SQP:  Modified  query 
context  DB_3: : IBM 
where  (DB_3: : IBM.Date>"l/l/95") 
retrieve  (DB_3 : : IBM . Date , DB_3 : : IBM . StockPrice) ; 
TARGET  SYSTEM :DB_3; 

DB_3_SQP:  Processing  query  in  DB_3 
DB_3 : : IBM . Date  DB_3 : : IBM . StockPrice 


DB_3_Mediator :  Data  conversion  begins. 
DB_3: : IBM. Date  ->  DB_1 :: Stock. Date 

DB_3: : IBM. StockPrice  ->  DB_1: : Stock. TradePrice 
14.6  ->  14\09 
15.5  ->  15\08 
14.3  ->  14\04 


2/1/95 
2/2/95 
2/3/95 
2/4/95 


11\09 
11\08 
12\04 
13\00 


3/1/95 
3/2/95 
3/3/95 
3/4/95 


14.6 
15.5 
14.3 
16 


16  ->  16\00 


DQP:  Data  received  from  DB_3_SQP  (4  instances) 
DB_1: : Stock. Date  DB_1: : Stockl .TradePrice 


3/1/95 
3/2/95 
3/3/95 
3/4/95 


14\09 
15\08 
14\04 
16\00 


DQP:  Assemble  data  of  the  3  subqueries. 


DQP:  Return  data  to  CLIENT 


CLIENT:  Receiving  data  from  DQP 

DB_1: : Stock. Date  DB_1: : Stockl. TradePrice 


1/1/95 

20\6 

1/2/95 

21\5 

1/3/95 

22\3 

1/4/95 

23\1 

2/1/95 

11\09 

2/2/95 

11\08 

2/3/95 

12\04 

2/4/95 

13\00 

3/1/95 

14\09 

3/2/95 

15\08 

3/3/95 

14\04 

3/4/95 

16\00 

END 
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